Chinese Characters in Datastage

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
Corvus
Participant
Posts: 10
Joined: Sat Apr 04, 2009 6:39 am

Chinese Characters in Datastage

Post by Corvus »

The Datastage Project is setup with Default NLS as UTF-8. The DB2 UDB 9.1 dbs are unicode databases with Codeset = UTF-8 and CodePage = 1208.

Datastage is populating "??" instead of chinese characters.

PLEASE HELP !!!
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Where is it showing "??" - the "View Data" won't correctly display data. You will also need to provide a bit more information in order to get a good analysis as to where the conversion is going wrong (if at all).
Corvus
Participant
Posts: 10
Joined: Sat Apr 04, 2009 6:39 am

Post by Corvus »

I am now able to populate Chinese characters from DB2 UDB 9.1 to DB2 UDB 9.1. Although it is not showing up in "View Data".
But in MS SQL Server 2005 the characters are showing as "�"

The Project Default NLS is UTF-8 and the Job stages are also using UTF-8 as NLS. The Locale set was US-ENGLISH but I changed to CN-CHINESE and still the job behaviour was similar.

Thanks,
Corvus
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

What were your NLS settings when you tried to view the data in DB2 and what tool/program did you use to display? Over half of the NLS issues I see are not really errors, but display problems. NLS isn't too complicated when you realize which NLS character set you are using in each stage. But you need to be consistent - if you wrote UTF-8 you need to read it with UTF-8; if you read it with ISO8859-1 then you are going to get gibberish. If you get the types of characters you posted then you need to explicitly get the Hex-codes of the characters instead of their glyphs and check the table to see what is actually being shown to you.

You can tell that something has happened and changed since earlier the "bad" output was 2 bytes and now it is only 1, meaning that the system has probably detected a 2byte character correctly but cannot display that code point with whatever display NLS setting you have.
Corvus
Participant
Posts: 10
Joined: Sat Apr 04, 2009 6:39 am

Post by Corvus »

The Project Default NLS is UTF-8 and the Stage NLS used to view data is also UTF-8.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Umm, at the risk of repeating myself - view data should not be used to test whether or not NLS characters are correclty represented.
Corvus
Participant
Posts: 10
Joined: Sat Apr 04, 2009 6:39 am

Post by Corvus »

What kind of Collation should be used to accept chinese characters in SQL server 2005 DB? If any pointers were available then it would be very helpful.

Thanks,
Corvus
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Collation probably doesn't apply, as it affect the sort order of characters, and not the representation.
With NLS problems you will need to take problem analysis step by step. First just read some data from SQL server and output it to a flat file. Look at the flat file with a tool you know supports multibyte. Are the characters displayed correctly?
gpatton
Premium Member
Premium Member
Posts: 47
Joined: Mon Jan 05, 2004 8:21 am

Post by gpatton »

The issue is the configuration of the SQLServer database. You need to set it up to support multi-byte characters - UTF-8, UTF-16, or UTF-32 are possible options. If the DB is set up for US-English you will have problems.
Post Reply