Effect of Unicode on data?

Daddy Doma · Post by **Daddy Doma** » Tue Jun 21, 2005 1:18 am

G'day Folks,

I've got a parallel job that brings together two sources in a change data capture stage. I keep updates and inserts, and split them into different loads.

Problem: The change data capture stage, and a later lookup stage, do not recognise two records with ID '000085188' as the same. Because of this, an update is treated as an insert.

The office has looked at a hexidecimal conversion of these "identical" IDs, and it appears that one of the values is being padded with trailing values. In hex they are represented as zeros, possibly some sort of null handling?

One theory is that the UNICODE setting in DataStage is changing the data. Does anyone have any experience with this problem?

Cheers,

Zac.

ArndW · Post by **ArndW** » Tue Jun 21, 2005 2:51 am

Most likely at some point in time the column with the trailing 0x000 data was defined as a CHAR or a PIC X type field and it was padded automatically with the low-value or null (empty) value. You can do a TRIM(CONVERT(CHAR(000),CHAR(032),YourColumn)) to remove these extraneous values.

Addendum
Oops, I forgot to add that the Unicode setting is most likely irrelevant in this case.

Daddy Doma · Post by **Daddy Doma** » Tue Jun 21, 2005 5:12 pm

Thanks ArndW,

We've searched through the job and confirmed that this is the case. Will use the code example you gave if we cannot remove the stage that changes the data types.

Regards,

Zac.