Transfering Japanese binary data from DB2 to Oracle

smleonard · Post by **smleonard** » Fri Apr 22, 2005 9:48 am

Hello,

I've run into a slight snag with my design and was wondering if anyone else had ever encountered a similar problem:

I am connecting to a DB2 database using the DB2 stage [not ODBC]. There are a few instances where Japanese characters are stored in fields in addition to English text. One particular field I am interested in at the moment, is a LONG VARCHAR field, where the values are stored as binary.

When importing the table definition, the field is listed as LongVarBinary. When using this, the data is returned like: "??LC > IN: ALERT ??" where the ?? indicate where there should be Japanese symbol(s).

[Our DBA has setup an iSQL website where we can go to verify Unicode characters]

If I change the datatype to LongVarChar, I am able to see the hex representation of how the data is stored:
B9E1C4C74C43203E20494E3A20414C455254B8A1C3CE

'B9 E1 C4 C7' and 'B8 A1 C3 C3' represent the Japanese characters.

How should I be mapping this into our database?

Some other factors:
*) I've been told that the DB2 code page is 1208 for UTF-8
*) Our Oracle database is also configured for Unicode support [other Japanese characters are stored]
*) We have NLS option enabled on the Server (version 7.1)
*) I am currently using a job default of "UTF-8" for the NLS.

Any ideas?
Thanks,
-Sean

ArndW · Post by **ArndW** » Fri Apr 22, 2005 10:22 am

Sean,

I think that your binary field is remaining the same, with any characters mapped into 1208 whilst upon output DataStage thinks it is in UTF-8; so they would not show correctly. Remember that the "view data" in DS as well as the display on many products does not support NLS and multibyte displays, so you would expect to see "??" instead of the Kanji or Kana. If you write the data to a sequential file and then view it with either a binary tool or an editor that you KNOW will display it correctly you will get certainty that something is going wrong.

If you push your data into a sequential file which you then read (you can use a pipe to save on disk I/O) you can then specify "per column NLS mapping" on the sequential file read, and can then explicitly convert your 1208 mapped column into UTF-8 while leaving the other columns unchanged.