Page 1 of 1

Error reading Japanese data

Posted: Fri Apr 09, 2010 5:45 pm
by BradMiller
I have a requirement where I need to read japanese data from .doc fie and write it into DB2 database table.DataStage and DB2 database are both NLS installations.The NLS_LANG which is being set on operating system is EN_US but operating system also has unicode locales being installed.Now I am reading data from sequential file with NLS_Map being set as EN_US.UTF-8 but still its not able to read it.Do I need to export the NLS_LANG to EN_US.UTF-8 in dsenv environment variable and leave operatng system locale as it.Would this resolve the problem if not could you please suggest me what steps need to be done as I have done this 6 years back and dont remember the procedure.And do I need to do any changes on the database side too.

Posted: Fri Apr 09, 2010 6:34 pm
by ray.wurlod
Japanese data is fraught - there are at least 14 different encodings in which it might be supplied. Your first step will need to be to discover how the data are actually encoded. A common map is SHIFT_JIS but there are even variants of that. You need to ask the providers of the data to be as complete and explicit as possible.

You may encounter other "nice" things, particularly if the data are sourced from mainframes, like using different encodings in different columns or even changing encodings part way through a string (signalled by Shift-In, Shift-Out characters (Char(15) and Char(16)).

I personally believe that server jobs do a better job with Japanese data than parallel jobs, but I have not had the opportunity to test my theory in version 8.