DataStage 7.5: UTF8
Moderators: chulett, rschirm, roy
DataStage 7.5: UTF8
If I've few jobs that need to handle multiple language (rest just English) , then do you suggest setting NLS =UTF8 for project level or at job level. Also, is UTF8 preferred for English?
Thanks $N
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Welcome aboard.
The "preferred" character map is the one that is used to encode your data. So, if your data are encoded using UTF-8, then use that. If your Chinese data are encoded using BIG-5 or GB2312, then use that map.
Most character maps include what you describe as "English" characters.
The "preferred" character map is the one that is used to encode your data. So, if your data are encoded using UTF-8, then use that. If your Chinese data are encoded using BIG-5 or GB2312, then use that map.
Most character maps include what you describe as "English" characters.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Thanks for the response.ray.wurlod wrote:Welcome aboard.
The "preferred" character map is the one that is used to encode your data. So, if your data are encoded using UTF-8, then use that. If your Chinese data are encoded using BIG-5 or ...
We're extracting information from Oracle (NLS_CHARACTERSET=AL32UTF8). Do we see any issue with using UTF8 at project level for all languages (including English) in DataStage. Thoughts?
Thanks $N
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Of course I don't see any. I can't see your data. Try UTF-8; chances are that that's the best fit for the Oracle map. Is this what's specified in the oracle_cs.txt file?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Thanks for your prompt response!ray.wurlod wrote:Of course I don't see any. I can't see your data. Try UTF-8; chances are that that's the best fit for the Oracle map. Is this what's specified in the oracle_cs.txt file? ...
The problem that i see is, the source is Oracle in AL32UTF8 (Unicode 4.0)--> DataStage in UTF8 (Unicode 3.0)--> Target Oracle in AL32UTF (Unicode 4.0)... So certain characters which can be inserted in Oracle need not pass through DataStage ... Is my understanding correct? Also, oracle_cs.txt file is used for parallel jobs I guess & it has all the character sets that DS supports. Also, DS 7.5 doesn't support AL32UTF8. Please correct/suggest.
Thanks $N
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Just because Oracle does it doesn't make it a standard. AL32UTF8 is - to my mind - Oracle being precious (I think it stands for American/ASCII Language 32-bit Unicode Transformation Format 8-bit, which the rest of the world knows as UTF-8).
That said, there is more than one eight-bit encoding for Unicode. The one used internally in DataStage is such a special case.
Also, Unicode/UTF does not encode "language" - it encodes characters. There is a Unicode code point for the smiley face character, even though this does not occur in any language.
That said, there is more than one eight-bit encoding for Unicode. The one used internally in DataStage is such a special case.
Also, Unicode/UTF does not encode "language" - it encodes characters. There is a Unicode code point for the smiley face character, even though this does not occur in any language.
Last edited by ray.wurlod on Fri Apr 18, 2008 3:48 pm, edited 1 time in total.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 53
- Joined: Fri Apr 18, 2008 8:13 am
- Location: United States
If I've few jobs that need to handle multiple language
in datastage for languages other then english please check the "UNICODE" from the extended type. this helps to get know the data better if it is anything other than english
Regards
Ajay
Ajay