Handling DBCS/CJK characters
Moderators: chulett, rschirm, roy
Handling DBCS/CJK characters
I am having trouble in viewing file with CJK characters. Please let me know the following :
1. I am able to view the Chinese character in local text editor.In which mode should I FTP the file to DS server (ascii/binary).
2. DS env is NLS enabled. In the DS job, am using Seq file stage to read the file. Which NLS Map is to be used for reading the Chinese data.
Also, please let me know if any other settings need to be done.
Thanks
1. I am able to view the Chinese character in local text editor.In which mode should I FTP the file to DS server (ascii/binary).
2. DS env is NLS enabled. In the DS job, am using Seq file stage to read the file. Which NLS Map is to be used for reading the Chinese data.
Also, please let me know if any other settings need to be done.
Thanks
1. What system is the data on? Windows, UNIX? What Character set is defined on the system where you can view the data?
2. If you use a binary transfer from your system, then you need to use the same character set in DataStage as on your system.
2. If you use a binary transfer from your system, then you need to use the same character set in DataStage as on your system.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
So you can view the data on your PC correctly. What is your PC character and how are you copying the file to your UNIX system? Is it a binary FTP? If so, use your PC Character set definition on the UNIX machine.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
Almost every single NLS thread here on DSXChange which deals with transformation or mapping problems has at least one post that explicitly says not to use the "view data" from the designer to detect or check multibyte character. This thread is now no longer an exception. Use your favorite editor or tool that you know works with DBCS to see if the characters are correct.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
With Ultra edit, I am able to see the remote DS file with Double byte characters, so I assume they are there and due to some 'unknown' issue, 'view data' will not show them.
My job design is like this :
Seq File -> Transformer -> DB2 API
I am using Transformer just to map the file fields to the DB2 table. But I don not find the DB2 table populated with the multi byte values.
In the Seq file I have used fields with 'Varchar' with extended property set, also set the Seq file stage to use NLS Map 'UTF-8'.DB2 API stage is also set in NLS Map 'UTF-8'.
Getting the following warning.
APT_CombinedOperatorController,0: Invalid character(s) ([xAC]) found converting string (code point(s): [x00][x17]S[xAC]N[xAE][x90]?e[x1F][x90][x12][x90]@\ [x00] [x00] [x00]) from codepage UTF-8 to Unicode, substituting.
My job design is like this :
Seq File -> Transformer -> DB2 API
I am using Transformer just to map the file fields to the DB2 table. But I don not find the DB2 table populated with the multi byte values.
In the Seq file I have used fields with 'Varchar' with extended property set, also set the Seq file stage to use NLS Map 'UTF-8'.DB2 API stage is also set in NLS Map 'UTF-8'.
Getting the following warning.
APT_CombinedOperatorController,0: Invalid character(s) ([xAC]) found converting string (code point(s): [x00][x17]S[xAC]N[xAE][x90]?e[x1F][x90][x12][x90]@\ [x00] [x00] [x00]) from codepage UTF-8 to Unicode, substituting.
With Ultra edit, I am able to see the remote DS file with Double byte characters, so I assume they are there and due to some 'unknown' issue, 'view data' will not show them.
My job design is like this :
Seq File -> Transformer -> DB2 API
I am using Transformer just to map the file fields to the DB2 table. But I don not find the DB2 table populated with the multi byte values.
In the Seq file I have used fields with 'Varchar' with extended property set, also set the Seq file stage to use NLS Map 'UTF-8'.DB2 API stage is also set in NLS Map 'UTF-8'.
Getting the following warning.
APT_CombinedOperatorController,0: Invalid character(s) ([xAC]) found converting string (code point(s): [x00][x17]S[xAC]N[xAE][x90]?e[x1F][x90][x12][x90]@\ [x00] [x00] [x00]) from codepage UTF-8 to Unicode, substituting.
My job design is like this :
Seq File -> Transformer -> DB2 API
I am using Transformer just to map the file fields to the DB2 table. But I don not find the DB2 table populated with the multi byte values.
In the Seq file I have used fields with 'Varchar' with extended property set, also set the Seq file stage to use NLS Map 'UTF-8'.DB2 API stage is also set in NLS Map 'UTF-8'.
Getting the following warning.
APT_CombinedOperatorController,0: Invalid character(s) ([xAC]) found converting string (code point(s): [x00][x17]S[xAC]N[xAE][x90]?e[x1F][x90][x12][x90]@\ [x00] [x00] [x00]) from codepage UTF-8 to Unicode, substituting.
first, don't look at the remote file, look at the file on the UNIX box after transfer. If the characters are still correct then you have removed a possible error source. The sequential file read stage will use the project default NLS setting. Assuming this is "UTF-8" it will read this file as if it were UTF-8 (which it isn't) and there you have your source of error. You will need to set the NLS attributes of the stage to the correct character set of the data.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
Thanks for your advise.
So, how do I determine the character set in the source file. I created the source file manually copying a few lines records with Chinese characters from a master file and then saving it as 'Unicode' encoding.
Also for the DB2 API target stage how to know the character set (that will accept Chinese) of it.
So, how do I determine the character set in the source file. I created the source file manually copying a few lines records with Chinese characters from a master file and then saving it as 'Unicode' encoding.
Also for the DB2 API target stage how to know the character set (that will accept Chinese) of it.
Earlier you indicated that the data was "IBM-1386" which would be simplified Chinese. Why not try using "ibm-1386_P100-2002" in your sequential file Stage -> NLS Map settings?
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>