Page 1 of 1

XML-Input Stage: wrong UTF-8 encoding

Posted: Thu Apr 19, 2007 6:20 am
by stivazzi
Hi All,
my Server job reads an XML file with correct uft-8 encoded character (i.e. the Trade Mark symbol is correctly encoded with the 3 bytes e284a2). After the XML-Input stage, that split not all data, but some xpaths, I found not correctly character encoded (the same TM symbol is transormed with '1a' character). For this test I used the useful editor fhred to see the encoded data.
I've also tryed to set in the job parameter a user variable called "NLS_LANG" with value 'American_America.WE8ISO8859P1' or 'AMERICAN_AMERICA.UTF8' or 'AMERICAN_AMERICA.WE8MSWIN1252' but seems that the xml stage does not care this variable.

Any help will be apreciated!

Thanks,
Andrea

Posted: Thu Apr 19, 2007 6:33 am
by chulett
The NLS_LANG approach is the correct one. How exactly did you try to 'set it' in your job? You should be setting the value in the Administrator as a User Defined Environment variable, adding it as a parameter to your job and then overriding the default value there.

Posted: Thu Apr 19, 2007 7:05 am
by stivazzi
chulett,
I set the NLS_LANG variable a user defined variable and added it into my server job. The problem seems that after XML-Input stage, the data are not correctly interpreted.
seqFile1(xml)-->XML-Input-->Transformer-->seqFile2

In seqFile1 the TM symbol is correctly encoded with 3 bytes.
In seqFile2 the same symbol is wrongly encoded with only 1 byte.

Thanks,
Andrea

Posted: Thu Apr 19, 2007 7:18 am
by chulett
I've found that I also need to set $LC_CTYPE to C.utf8 to get this to work for me on my server.