DataStage 7.5: UTF8

Sum18us · Post by **Sum18us** » Mon Apr 14, 2008 10:57 am

If I've few jobs that need to handle multiple language (rest just English) , then do you suggest setting NLS =UTF8 for project level or at job level. Also, is UTF8 preferred for English?

ray.wurlod · Post by **ray.wurlod** » Mon Apr 14, 2008 4:08 pm

Welcome aboard.

The "preferred" character map is the one that is used to encode your data. So, if your data are encoded using UTF-8, then use that. If your Chinese data are encoded using BIG-5 or GB2312, then use that map.

Most character maps include what you describe as "English" characters.

Sum18us · Post by **Sum18us** » Tue Apr 15, 2008 9:52 am

ray.wurlod wrote:Welcome aboard.

The "preferred" character map is the one that is used to encode your data. So, if your data are encoded using UTF-8, then use that. If your Chinese data are encoded using BIG-5 or ...

Thanks for the response.

We're extracting information from Oracle (NLS_CHARACTERSET=AL32UTF8). Do we see any issue with using UTF8 at project level for all languages (including English) in DataStage. Thoughts?

ray.wurlod · Post by **ray.wurlod** » Tue Apr 15, 2008 3:16 pm

Of course I don't see any. I can't see your data. Try UTF-8; chances are that that's the best fit for the Oracle map. Is this what's specified in the oracle_cs.txt file?

Sum18us · Post by **Sum18us** » Wed Apr 16, 2008 11:03 am

ray.wurlod wrote:Of course I don't see any. I can't see your data. Try UTF-8; chances are that that's the best fit for the Oracle map. Is this what's specified in the oracle_cs.txt file? ...

Thanks for your prompt response!

The problem that i see is, the source is Oracle in AL32UTF8 (Unicode 4.0)--> DataStage in UTF8 (Unicode 3.0)--> Target Oracle in AL32UTF (Unicode 4.0)... So certain characters which can be inserted in Oracle need not pass through DataStage ... Is my understanding correct? Also, oracle_cs.txt file is used for parallel jobs I guess & it has all the character sets that DS supports. Also, DS 7.5 doesn't support AL32UTF8. Please correct/suggest.

ray.wurlod · Post by **ray.wurlod** » Wed Apr 16, 2008 4:44 pm

Just because Oracle does it doesn't make it a standard. AL32UTF8 is - to my mind - Oracle being precious (I think it stands for American/ASCII Language 32-bit Unicode Transformation Format 8-bit, which the rest of the world knows as UTF-8).

That said, there is more than one eight-bit encoding for Unicode. The one used internally in DataStage is such a special case.

Also, Unicode/UTF does not encode "language" - it encodes characters. There is a Unicode code point for the smiley face character, even though this does not occur in any language.

ajay.vaidyanathan · Post by **ajay.vaidyanathan** » Fri Apr 18, 2008 8:55 am

in datastage for languages other then english please check the "UNICODE" from the extended type. this helps to get know the data better if it is anything other than english