DataStage 7.5: UTF8

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
Sum18us
Participant
Posts: 9
Joined: Tue Jul 24, 2007 11:30 am

DataStage 7.5: UTF8

Post by Sum18us »

If I've few jobs that need to handle multiple language (rest just English) , then do you suggest setting NLS =UTF8 for project level or at job level. Also, is UTF8 preferred for English?
Thanks $N
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Welcome aboard.

The "preferred" character map is the one that is used to encode your data. So, if your data are encoded using UTF-8, then use that. If your Chinese data are encoded using BIG-5 or GB2312, then use that map.

Most character maps include what you describe as "English" characters.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Sum18us
Participant
Posts: 9
Joined: Tue Jul 24, 2007 11:30 am

Post by Sum18us »

ray.wurlod wrote:Welcome aboard.

The "preferred" character map is the one that is used to encode your data. So, if your data are encoded using UTF-8, then use that. If your Chinese data are encoded using BIG-5 or ...
Thanks for the response.

We're extracting information from Oracle (NLS_CHARACTERSET=AL32UTF8). Do we see any issue with using UTF8 at project level for all languages (including English) in DataStage. Thoughts?
Thanks $N
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Of course I don't see any. I can't see your data. Try UTF-8; chances are that that's the best fit for the Oracle map. Is this what's specified in the oracle_cs.txt file?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Sum18us
Participant
Posts: 9
Joined: Tue Jul 24, 2007 11:30 am

Post by Sum18us »

ray.wurlod wrote:Of course I don't see any. I can't see your data. Try UTF-8; chances are that that's the best fit for the Oracle map. Is this what's specified in the oracle_cs.txt file? ...
Thanks for your prompt response!

The problem that i see is, the source is Oracle in AL32UTF8 (Unicode 4.0)--> DataStage in UTF8 (Unicode 3.0)--> Target Oracle in AL32UTF (Unicode 4.0)... So certain characters which can be inserted in Oracle need not pass through DataStage ... Is my understanding correct? Also, oracle_cs.txt file is used for parallel jobs I guess & it has all the character sets that DS supports. Also, DS 7.5 doesn't support AL32UTF8. Please correct/suggest.
Thanks $N
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Just because Oracle does it doesn't make it a standard. AL32UTF8 is - to my mind - Oracle being precious (I think it stands for American/ASCII Language 32-bit Unicode Transformation Format 8-bit, which the rest of the world knows as UTF-8).

That said, there is more than one eight-bit encoding for Unicode. The one used internally in DataStage is such a special case.

Also, Unicode/UTF does not encode "language" - it encodes characters. There is a Unicode code point for the smiley face character, even though this does not occur in any language.
Last edited by ray.wurlod on Fri Apr 18, 2008 3:48 pm, edited 1 time in total.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ajay.vaidyanathan
Participant
Posts: 53
Joined: Fri Apr 18, 2008 8:13 am
Location: United States

If I've few jobs that need to handle multiple language

Post by ajay.vaidyanathan »

in datastage for languages other then english please check the "UNICODE" from the extended type. this helps to get know the data better if it is anything other than english
Regards
Ajay
Post Reply