Page 1 of 1

Handling DBCS/CJK characters

Posted: Thu Sep 10, 2009 10:16 am
by mydsworld
I am having trouble in viewing file with CJK characters. Please let me know the following :

1. I am able to view the Chinese character in local text editor.In which mode should I FTP the file to DS server (ascii/binary).

2. DS env is NLS enabled. In the DS job, am using Seq file stage to read the file. Which NLS Map is to be used for reading the Chinese data.

Also, please let me know if any other settings need to be done.

Thanks

Posted: Thu Sep 10, 2009 10:22 am
by ArndW
1. What system is the data on? Windows, UNIX? What Character set is defined on the system where you can view the data?

2. If you use a binary transfer from your system, then you need to use the same character set in DataStage as on your system.

Posted: Thu Sep 10, 2009 10:51 am
by mydsworld
I dont know from which system data was generated. I got the data in xls over e-mail. When I open the xls, I can view those Chinese character.

Code page used in host conversion program is IBM-1386,code page in Excel is GB 2312.

Posted: Thu Sep 10, 2009 10:53 am
by mydsworld
I dont know from which system data was generated. I got the data in xls over e-mail. When I open the xls, I can view those Chinese character.

Code page used in host conversion program is IBM-1386,code page in Excel is GB 2312.

Posted: Thu Sep 10, 2009 11:10 am
by ArndW
So you can view the data on your PC correctly. What is your PC character and how are you copying the file to your UNIX system? Is it a binary FTP? If so, use your PC Character set definition on the UNIX machine.

Posted: Sat Sep 19, 2009 8:04 pm
by mydsworld
I am sending the file in Binary mode over FTP to DS server. When I am viewing the file in Seq File stage, I find the double bytes characters in '???' etc instead of the Chinese characters.

Please advise.

Posted: Sun Sep 20, 2009 5:30 am
by ArndW
Almost every single NLS thread here on DSXChange which deals with transformation or mapping problems has at least one post that explicitly says not to use the "view data" from the designer to detect or check multibyte character. This thread is now no longer an exception. Use your favorite editor or tool that you know works with DBCS to see if the characters are correct.

Posted: Sun Sep 20, 2009 12:33 pm
by mydsworld
With Ultra edit, I am able to see the remote DS file with Double byte characters, so I assume they are there and due to some 'unknown' issue, 'view data' will not show them.

My job design is like this :

Seq File -> Transformer -> DB2 API

I am using Transformer just to map the file fields to the DB2 table. But I don not find the DB2 table populated with the multi byte values.

In the Seq file I have used fields with 'Varchar' with extended property set, also set the Seq file stage to use NLS Map 'UTF-8'.DB2 API stage is also set in NLS Map 'UTF-8'.

Getting the following warning.

APT_CombinedOperatorController,0: Invalid character(s) ([xAC]) found converting string (code point(s): [x00][x17]S[xAC]N[xAE][x90]?e[x1F][x90][x12][x90]@\ [x00] [x00] [x00]) from codepage UTF-8 to Unicode, substituting.

Posted: Sun Sep 20, 2009 12:34 pm
by mydsworld
With Ultra edit, I am able to see the remote DS file with Double byte characters, so I assume they are there and due to some 'unknown' issue, 'view data' will not show them.

My job design is like this :

Seq File -> Transformer -> DB2 API

I am using Transformer just to map the file fields to the DB2 table. But I don not find the DB2 table populated with the multi byte values.

In the Seq file I have used fields with 'Varchar' with extended property set, also set the Seq file stage to use NLS Map 'UTF-8'.DB2 API stage is also set in NLS Map 'UTF-8'.

Getting the following warning.

APT_CombinedOperatorController,0: Invalid character(s) ([xAC]) found converting string (code point(s): [x00][x17]S[xAC]N[xAE][x90]?e[x1F][x90][x12][x90]@\ [x00] [x00] [x00]) from codepage UTF-8 to Unicode, substituting.

Posted: Sun Sep 20, 2009 8:50 pm
by mydsworld
Couple of other observation.

1. I am able to insert into the DB2 table the Chinese characters (from Toad).

2. The DS job populates DB2 table. But when I view the data in Toad, it is not Chinese.

3. Also what NLS map shd I choose for each stages in the job :

Seq File -> Transformer -> DB2 API

Posted: Mon Sep 21, 2009 1:26 am
by ArndW
first, don't look at the remote file, look at the file on the UNIX box after transfer. If the characters are still correct then you have removed a possible error source. The sequential file read stage will use the project default NLS setting. Assuming this is "UTF-8" it will read this file as if it were UTF-8 (which it isn't) and there you have your source of error. You will need to set the NLS attributes of the stage to the correct character set of the data.

Posted: Mon Sep 21, 2009 6:16 am
by mydsworld
Thanks for your advise.

So, how do I determine the character set in the source file. I created the source file manually copying a few lines records with Chinese characters from a master file and then saving it as 'Unicode' encoding.

Also for the DB2 API target stage how to know the character set (that will accept Chinese) of it.

Posted: Mon Sep 21, 2009 8:35 am
by ArndW
Earlier you indicated that the data was "IBM-1386" which would be simplified Chinese. Why not try using "ibm-1386_P100-2002" in your sequential file Stage -> NLS Map settings?