Page 1 of 1

Chinese Characters in Datastage

Posted: Wed Jun 17, 2009 4:01 am
by Corvus
The Datastage Project is setup with Default NLS as UTF-8. The DB2 UDB 9.1 dbs are unicode databases with Codeset = UTF-8 and CodePage = 1208.

Datastage is populating "??" instead of chinese characters.

PLEASE HELP !!!

Posted: Wed Jun 17, 2009 4:04 am
by ArndW
Where is it showing "??" - the "View Data" won't correctly display data. You will also need to provide a bit more information in order to get a good analysis as to where the conversion is going wrong (if at all).

Posted: Wed Jun 17, 2009 6:11 am
by Corvus
I am now able to populate Chinese characters from DB2 UDB 9.1 to DB2 UDB 9.1. Although it is not showing up in "View Data".
But in MS SQL Server 2005 the characters are showing as "�"

The Project Default NLS is UTF-8 and the Job stages are also using UTF-8 as NLS. The Locale set was US-ENGLISH but I changed to CN-CHINESE and still the job behaviour was similar.

Thanks,
Corvus

Posted: Wed Jun 17, 2009 8:20 am
by ArndW
What were your NLS settings when you tried to view the data in DB2 and what tool/program did you use to display? Over half of the NLS issues I see are not really errors, but display problems. NLS isn't too complicated when you realize which NLS character set you are using in each stage. But you need to be consistent - if you wrote UTF-8 you need to read it with UTF-8; if you read it with ISO8859-1 then you are going to get gibberish. If you get the types of characters you posted then you need to explicitly get the Hex-codes of the characters instead of their glyphs and check the table to see what is actually being shown to you.

You can tell that something has happened and changed since earlier the "bad" output was 2 bytes and now it is only 1, meaning that the system has probably detected a 2byte character correctly but cannot display that code point with whatever display NLS setting you have.

Posted: Thu Jun 18, 2009 2:33 am
by Corvus
The Project Default NLS is UTF-8 and the Stage NLS used to view data is also UTF-8.

Posted: Thu Jun 18, 2009 3:51 am
by ArndW
Umm, at the risk of repeating myself - view data should not be used to test whether or not NLS characters are correclty represented.

Posted: Thu Jun 18, 2009 7:27 am
by Corvus
What kind of Collation should be used to accept chinese characters in SQL server 2005 DB? If any pointers were available then it would be very helpful.

Thanks,
Corvus

Posted: Thu Jun 18, 2009 7:37 am
by ArndW
Collation probably doesn't apply, as it affect the sort order of characters, and not the representation.
With NLS problems you will need to take problem analysis step by step. First just read some data from SQL server and output it to a flat file. Look at the flat file with a tool you know supports multibyte. Are the characters displayed correctly?

Posted: Thu Jun 18, 2009 8:06 am
by gpatton
The issue is the configuration of the SQLServer database. You need to set it up to support multi-byte characters - UTF-8, UTF-16, or UTF-32 are possible options. If the DB is set up for US-English you will have problems.