Page 1 of 1

ANSI to UTF-8

Posted: Mon Jun 10, 2013 6:31 am
by senthil_tcs
Hello,

I want to convert the sequential file from ANSI to UTF-8 format.
I have tried setting the NLS MAP to UTF-8 at the project level and the NLS MAP at the stage level is also set to UTF-8 just to make sure. The record delimiter is set to UNIX Newline.

The file is not geting created in the UTF-8 format. When I download and open the test file in Edit Plus Editor or Textpad Editor, the file encoding shows as ANSI.

This looks very strange, I am not sure if we are missing something. I have managed to replicate this issue using a sample job with row generator and a sequential file. Any help much appreciated. DataStage Version 8.5/OS - AIX

Thanks,
Senthil

Posted: Mon Jun 10, 2013 6:44 am
by chulett
Have you considered that perhaps your "download" step is affecting the file format? What does it test as if you check it directly on the UNIX server using 'file'?

And you certainly don't need a DataStage job to convert the file, iconv from the command line would do it but you'd have to clarify what 'ANSI' means as that is a generic Windows term. Probably 'Windows-1252'.

Posted: Mon Jun 10, 2013 7:24 am
by senthil_tcs
Thanks for your response, I tried a sample job as the original job which transforms the XML to a CSV file is quite complex. For the sample job I have moved the file created in windows setting encoding type as 'ANSI' and moved to AIX using FTP client in binary mode. I am able to see the format as ' ASCII TEXT', output is also in 'ASCII TEXT', if the outfile file is in UTF-8 it shows as ' data or International Language text' in AIX. I have tested the job with sample UTF-8 file as source, the target is getting created in UTF-8.

Posted: Mon Jun 10, 2013 3:00 pm
by ray.wurlod
Does this indicate that your problem is resolved? If so, please mark this thread as Resolved, to assist future searchers.

Posted: Tue Jun 11, 2013 12:46 am
by senthil_tcs
The problem is not resolved, I have responded to 'chulett' question. The issue is, I am unable to create a UTF-8 CSV file in the target. Even I have tested this with a sample job with Row Generator and Sequential file stage. Is there anything I am missing? UTF-8 is set at project level, job level...I even tried setting the same in sequential file stage but still the result is same.

Any light on this issue much appreciated.

Thanks,
Senthil Kumar

Posted: Tue Jun 11, 2013 3:03 am
by ray.wurlod
Can you explicitly set the extended property "Unicode" for each character string?

Posted: Thu Jun 13, 2013 3:58 am
by senthil_tcs
Thanks, its still the same. I have explicitly set the exrtended property across all the stages, still the output is in 'ASCII TEXT'. I assume the setting given in the stage will overrisde any setting given in Job/Project/UVCONFIG. In stage/job/project the NLS is set to UTF-8. The reason is, when I checked the uvconfig the following are values set. I am keen on the NLSDEFSEQMAP as it says '# with sequential file input/output to a file or # device that has no explicit map associated with # it. Can be overridden by a SET.SEQ.MAP command'.

I am not sure if the issue is because of this.

Please share your thoughts or if you have any other suggestion please advise.

NLSDEFSEQMAP ISO8859-1

NLSMODE = 1
NLSREADELSE = 1
NLSWRITEELSE = 1
NLSDEFSOCKMAP = NONE
NLSDEFFILEMAP = ISO8859-1
NLSDEFDIRMAP = ISO8859-1+MARKS
NLSNEWFILEMAP = NONE
NLSNEWDIRMAP = ISO8859-1
NLSDEFPTRMAP = ISO8859-1
NLSDEFTERMMAP = ISO8859-1
NLSDEFDEVMAP = ISO8859-1
NLSDEFGCIMAP = NONE
NLSDEFSRVMAP = MS1252-CS
NLSDEFSEQMAP = ISO8859-1
NLSOSMAP = ISO8859-1+MARKS
NLSLCMODE = 1
NLSDEFUSERLC = US-ENGLISH
NLSDEFSRVLC = US-ENGLISH

Thanks,
Senthil

Posted: Thu Jun 13, 2013 1:29 pm
by Mike
Just to clarify, 7-bit ASCII characters are a subset of UTF-8, so they are encoded exactly the same and there is no conversion. If your entire "ANSI" source file consists of 7-bit ASCII characters then you will see no difference. Try including some character in your source file that you know will convert to a 2+ byte UTF-8 character.

Mike

Posted: Thu Jun 13, 2013 11:26 pm
by senthil_tcs
Hello Mike,
Thanks, I will check on this and get back to you.

Thanks,
Senthil

Posted: Mon Jun 17, 2013 1:15 am
by senthil_tcs
The file is creating as UTF-8 if I pass some UTF-8 special characters. I am still not clear why the same is not happening when we pass normal characters which is again valid UTF-8 characters. Any thoughts?

Thanks,
Senthil