Convert accented characters to english

chulett · Post by **chulett** » Wed Oct 10, 2007 10:38 am

One suggestion - use Convert rather than EReplace, that way you'll only need one statement. Check the online help for the gory details.

gagan8877 · Post by **gagan8877** » Wed Oct 10, 2007 10:39 am

chulett wrote:One suggestion - use Convert rather than EReplace, that way you'll only need one statement. Check the online help for the gory details.

Already tried, does not work.

chulett · Post by **chulett** » Wed Oct 10, 2007 10:53 am

Post your syntax - the suggestion was strictly in regards to using one rather than a bajillion inline function calls.

gagan8877 · Post by **gagan8877** » Wed Oct 10, 2007 1:02 pm

chulett wrote:Post your syntax - the suggestion was strictly in regards to using one rather than a bajillion inline function calls.

The syntax still contains a billion functions:

Convert(Char(135),'c',
Convert(Char(128),'C',
Convert(Char(221),'Y',
Convert(Char(150),'u',
Convert(Char(151),'u',
Convert(Char(219),'U',
Convert(Char(217),'U',
Convert(Char(147),'o',
Convert(Char(162),'o',
Convert(Char(212),'O',
Convert(Char(211),'O',
Convert(Char(140),'i',
Convert(Char(206),'I',
Convert(Char(136),'e',
Convert(Char(130),'e',
Convert(Char(138),'e',
Convert(Char(202),'E',
Convert(Char(144),'E',
Convert(Char(200),'E',
Convert(Char(131),'a',
Convert(Char(160),'a',
Convert(Char(133),'a',
Convert(Char(194),'A',
Convert(char(193),'A',
Convert(char(192),'A', Seq2Xmf.FirstName)))))))))))))))))))))))))

But the point is that without CONVERT and without EREPLACE the end result is the same. Both of these are proving useless.

Raftsman · Post by **Raftsman** » Wed Oct 10, 2007 1:54 pm

You could always create a lookup fileset and compare and replace with the character you want.

gagan8877 · Post by **gagan8877** » Wed Oct 10, 2007 3:16 pm

Raftsman wrote:You could always create a lookup fileset and compare and replace with the character you want.

As I explained earlier, DS is not reading the accented characters correctly, so I don't think the junk character can match to the lookup.

ray.wurlod · Post by **ray.wurlod** » Wed Oct 10, 2007 5:24 pm

Try it without Char(128) in the mix. Char(128) is (by default) DataStage's internal represntation of NULL, and Microsoft's internal representation of the Euro character. With NLS enabled this is mapped to the private use area.

You only need one Convert function. Set up a two stage variables, one containing a string of the accented characters, the other containing a string of their replacements. Here's a cut down example.
svAccentedChars initialized to Char(150):Char(151):Char(219):Char(217)
svReplaceChars initialized to "uuUU"
Don't derive the stage variables for each row - let them keep their initial values.
Your derivation expression is then

Code: Select all

Convert(svAccentedChars,svReplaceChars,InLink.TheString)

chulett · Post by **chulett** » Wed Oct 10, 2007 6:56 pm

gagan8877 wrote:
chulett wrote:Post your syntax - the suggestion was strictly in regards to using one rather than a bajillion inline function calls.
The syntax still contains a billion functions:

Convert(Char(135),'c',
<snip>

Not if you set it up correctly. The online help shows it takes a list of characters to look for and a corresponding (matched) list of replacement characters, all in one call.

Ray has spelled out the best way to do this, in the initial value of two stage variables and then leave the stage variable Derivations blank so the strings are only evaluated once. Hopefully, the advice about the Char(128) is the key here...

gagan8877 · Post by **gagan8877** » Thu Oct 11, 2007 3:52 pm

chulett wrote:
gagan8877 wrote:
chulett wrote:Post your syntax - the suggestion was strictly in regards to using one rather than a bajillion inline function calls.
The syntax still contains a billion functions:

Convert(Char(135),'c',
<snip>
Not if you set it up correctly. The online help shows it takes a list of characters to look for and a corresponding (matched) list of replacement characters, all in one call.

Ray has spelled out the best way to do this, in the initial value of two stage variables and then leave the stage variable Derivations blank so the strings are only evaluated once. Hopefully, the advice about the Char(128) is the key here...

Thanks for the valuable input Ray. I tried that, but the results were not as expected:

"RNALDAAAAAAAAAAAAAAAAAAA"

Even the English names (non-accented) got changed. Examples:

"MARTYNEAAAAAAAAAAAAAAAAAA"
"MARTINAAAAAAAAAAAAAAAAAAA"
"ANNIEAAAAAAAAAAAAAAAAAAAA"

The code is as under:

Transformer Variables:

svAccentedChars
char(192):char(193):Char(194):Char(133):Char(160):Char(131):Char(200):Char(144):Char(202):Char(138):Char(130):Char(136):Char(206):Char(140):Char(211):Char(212):Char(162):Char(147):Char
(217):Char(219):Char(151):Char(150):Char(221):Char(135)

svReplaceChars
"AAAaaaEEEeeeIiOOooUUuuYc"

Derivation:

Convert(svAccentedChars,svReplaceChars, AUDIT_In_NA.NA_FIRST_NAME)

Please let me know what am I doing wrong here?

ray.wurlod · Post by **ray.wurlod** » Fri Oct 12, 2007 2:03 pm

I still believe the single Convert() function can do it. Can you take a look at some of your accented data with a hex editor and confirm the byte codes being used?

DSXchange

Convert accented characters to english

CONVERT does not work