problem when reading Chinese character in complex flat file

yimwai · Post by **yimwai** » Thu Nov 04, 2010 5:08 am

I need to read one complex file and load the data into DB2.
THE COPYBOOK is like as follow:
1:RCZSCB3A:-REC.
05 :RCZSCB3A:-NAME X(10).

And The Table:
A(
NAME CHAR(10)
)

Some Chinese characters may be in the column , so I set the property "extended" : "unicode"
"nls map" : "ibm-1383"
and set "Field width" : 10.
In the EBCDIC file , there is only one record :
0x0E 0x49 0xE1 0x55 0xD3 0x0F 0x40 0x40 0x40 0x40
(0x49 0xE1 0x55 0xD3 means "测试")
after insert the record by DB2 ENTERPRISE SATAGE I
select * from A where NAME="测试"，it returns nothing。
AND ASCII(SUBSTR(NAME,9,1))=0 ;
ASCII(SUBSTR(NAME,10,1))=20 ;
LENGTH(TRIM(NAME)) = 9
I GUESS IT IS CAUSED BY "PAD CHARACTER" WHICH I SET 0X20

If I use "connector stage" to insert the record
the result is fine.the sql statament above can return one record and
ASCII(SUBSTR(NAME,9,1))=20 ;
ASCII(SUBSTR(NAME,10,1))=20 ;
LENGTH(TRIM(NAME)) = 4

What should I do to make the dB2 enterprsie stage to insert
"测试0x20 0x20 0x20 0x20 0x20 0x20"
but not "测试0x20 0x20 0x20 0x20 0x00 0x20"

ArndW · Post by **ArndW** » Thu Nov 04, 2010 5:19 am

Since you are working with X(10) fixed with character, there is no pad character used anywhere, likewise your SELECT won't return any values since you are using just the 2 characters.

I am not sure if DB2 Char(10) means 10x 1byte or 10 unicode characters.

You have 10 bytes in your source. The initial "0x0E" would be shift-in to double-byte, then your 4 bytes for the 2 Kanji and subsequently the "0x0F" for shift-out. That leaves 4 0x40 in EBCDIC which are your 4 spaces padding.

I suggest you make a dummy output from a transform into a PEEK stage for this and output the values of SEQ(In.Column[1,1]) through SEQ(In.Column[10,1]) to see what the string has been converted into.