Considerations of changing src NLS from ISO8859-1 to UTF8

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
crsimms
Charter Member
Charter Member
Posts: 21
Joined: Mon May 30, 2005 4:21 am
Contact:

Considerations of changing src NLS from ISO8859-1 to UTF8

Post by crsimms »

Hello,

Can anyone inform me of the potential side effects to beware aware of when changing a job's driver stage (sequential file) NLS from ISO8859-1 to UTF8? This particular job defaults to ISO8859-1 and utilizes hash files writes/reads and database tables writes/reads. I do not know if the hash files can continue to use the job's default NLS (ISO8859-1) and operate properly against the incoming UTF8 data. I understand that UTF8 can represent a character as one to four bytes. I am not knowledgeable as to how DataStage stores characters internally and handles translation based upon different code sets. At the very least, I am assuming the database tables need to be recreated using nchar versus char.

Any help would be appreciated.

Thanks,
Chris Simms

Mobile: +972 989 0919
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Welcome aboard! :D

Do you understand that UTF8 is one of many possible encodings of Unicode? You are right in that, theoretically, each code point may be represented by one to four bytes under this encoding. Are you certain that this is how your external data are encoded?

The NLS map converts between the external data's encoding and a specialized internal encoding (called UV-UTF8) of Unicode. Unless you correctly specify how the external data are encoded, you are likely to encounter unmappable characters.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
crsimms
Charter Member
Charter Member
Posts: 21
Joined: Mon May 30, 2005 4:21 am
Contact:

Post by crsimms »

Ray,

Thanks for responding. UTF8 is definitely how the incoming date will be mapped. As for the NLS settings for the outputs are concerned, hash files, sequential files, database, and the SAP R3 IDoc loader, they can be changed use UTF8 as well. The database is a bit more complicated in that a modification to table schema information (char to nchar) or the referenced code page will have to be made. Either way, I believe the database will have to be exported, rebuilt and then reloaded.

Thanks again,
Chris Simms

Mobile: +972 989 0919
Post Reply