Character Conversion

ArndW · Post by **ArndW** » Wed Jul 31, 2013 6:40 am

This looks like a typical NLS issue.

First of all, you need to find out in what character set the data was stored in DB2. If the characters were converted to EBCDIC500, which doesn't support Thai characters, then your NLS information is --poof-- gone forever.

If the data is entered into the database in the character set it was typed in as, i.e. with no conversion, then in DataStage you need to enter that database as the NLS source type. Likewise, if the data was converted from native Thai to a character set that maps the Thai letters then you need to specify that as the DB2 source character set.

Once DataStage knows what character set to use while reading, it can then perform mapping (where possible) to another character set.

sid19 · Post by **sid19** » Wed Jul 31, 2013 7:55 am

Hi,

In Iseries DB2, EBCDIC500 is the character set and Thai characters are converted to EBCDIC500.

Thanks

arunkumarmm · Post by **arunkumarmm** » Wed Jul 31, 2013 8:03 am

Then I believe, you should use the same NLS map while reading it as well. Did you try that already?

ArndW · Post by **ArndW** » Wed Jul 31, 2013 9:43 am

The EBCDIC 500 Character set support LATIN-1 encoding. It is a single-byte system where all the position are occupied by characters, trying to encode any additional ones would mean that single bytes could mean multiple encodings... so if you read an 0x44 byte it could mean either the EBCDIC500 entry or perhaps a mapped Thai character so there's absolutely no way for DataStage to determine what the character could represent.

Perhaps the original 1-byte Thai characters have just been written, 1-1, into the table, in which case you could use a Thai-character set to read it. If that wasn't done, your data is corrupted and unuseable.