Page 1 of 1

DataStage 7.5: UTF8

Posted: Mon Apr 14, 2008 10:57 am
by Sum18us
If I've few jobs that need to handle multiple language (rest just English) , then do you suggest setting NLS =UTF8 for project level or at job level. Also, is UTF8 preferred for English?

Posted: Mon Apr 14, 2008 4:08 pm
by ray.wurlod
Welcome aboard.

The "preferred" character map is the one that is used to encode your data. So, if your data are encoded using UTF-8, then use that. If your Chinese data are encoded using BIG-5 or GB2312, then use that map.

Most character maps include what you describe as "English" characters.

Posted: Tue Apr 15, 2008 9:52 am
by Sum18us
ray.wurlod wrote:Welcome aboard.

The "preferred" character map is the one that is used to encode your data. So, if your data are encoded using UTF-8, then use that. If your Chinese data are encoded using BIG-5 or ...
Thanks for the response.

We're extracting information from Oracle (NLS_CHARACTERSET=AL32UTF8). Do we see any issue with using UTF8 at project level for all languages (including English) in DataStage. Thoughts?

Posted: Tue Apr 15, 2008 3:16 pm
by ray.wurlod
Of course I don't see any. I can't see your data. Try UTF-8; chances are that that's the best fit for the Oracle map. Is this what's specified in the oracle_cs.txt file?

Posted: Wed Apr 16, 2008 11:03 am
by Sum18us
ray.wurlod wrote:Of course I don't see any. I can't see your data. Try UTF-8; chances are that that's the best fit for the Oracle map. Is this what's specified in the oracle_cs.txt file? ...
Thanks for your prompt response!

The problem that i see is, the source is Oracle in AL32UTF8 (Unicode 4.0)--> DataStage in UTF8 (Unicode 3.0)--> Target Oracle in AL32UTF (Unicode 4.0)... So certain characters which can be inserted in Oracle need not pass through DataStage ... Is my understanding correct? Also, oracle_cs.txt file is used for parallel jobs I guess & it has all the character sets that DS supports. Also, DS 7.5 doesn't support AL32UTF8. Please correct/suggest.

Posted: Wed Apr 16, 2008 4:44 pm
by ray.wurlod
Just because Oracle does it doesn't make it a standard. AL32UTF8 is - to my mind - Oracle being precious (I think it stands for American/ASCII Language 32-bit Unicode Transformation Format 8-bit, which the rest of the world knows as UTF-8).

That said, there is more than one eight-bit encoding for Unicode. The one used internally in DataStage is such a special case.

Also, Unicode/UTF does not encode "language" - it encodes characters. There is a Unicode code point for the smiley face character, even though this does not occur in any language.

If I've few jobs that need to handle multiple language

Posted: Fri Apr 18, 2008 8:55 am
by ajay.vaidyanathan
in datastage for languages other then english please check the "UNICODE" from the extended type. this helps to get know the data better if it is anything other than english