Error Profiling Japanese Character

sweta rai · Post by **sweta rai** » Thu Jan 29, 2009 2:28 am

Hi ,

I am trying to do a column analysis on a column which is having Japanese character in it . But it is not able to profile those data and giving a warning like :

pxbridge: [IIS-CONN-DAAPI-000067] Schema reconciliation detected a size mismatch for field SUPNM. When moving data from source field type STRING(min=0,max=135,charset=windows-1252) into target field type STRING(min=0,max=135,charset=UTF-8 ), truncation, loss of precision or data corruption can occur. Use STRING(min=0,max=405,charset=UTF-8) for target type

The Data source is Oracle database where the UTF-8 setting is done so the japanese characters are correctly populated there .

The driver which we are using is "IBM Oracle Wire Protocol".

What i could understand from the above warning is that the driver which we are using is not able to read UTF-8 data and its default charset is windows-1252.

Do i need to use other driver to connect to the oracle data source or something else needs to be done ??

Kindly somebody suggest . I'm sort of stuck in this problem .

ray.wurlod · Post by **ray.wurlod** » Thu Jan 29, 2009 3:15 am

Code page 1252 (US English) will not handle Japanese characters. You need to find out how the Japanese characters are encoded and specify the same mapping for DataStage to use. In this case, and provided that the source is encoded using UTF-8, you need to vary the source metadata.

sweta rai · Post by **sweta rai** » Thu Jan 29, 2009 3:43 am

Hi ray ,

Do i need to set the code page for Japanese character while loading the data to Oracle Table ?

OR

Do i need to set the NLS for Japanese character in the IIS Administrator ?Earlier it was set to UTF-8.

Please clarify me.

ray.wurlod · Post by **ray.wurlod** » Thu Jan 29, 2009 2:46 pm

None of that is relevant until you can successfully READ the Japanese characters. You need to establish how these are encoded in your source table, and set the mapping in DataStage to correspond to that. Based on your opening question, the target already uses UTF-8, but I'd check that anyway.

sweta rai · Post by **sweta rai** » Thu Jan 29, 2009 10:58 pm

Ray , I'm afraid i cud not get you what exactly you want to say ....

The Datastage job which populated that oracle table from source file has all encoding done correctly .
So , the data in the oracle table are stored correctly and we are able to read the Japanese data properly in the table.

Now For profiling ..we do not need to design any job . We just imported and binded that oracle data source in information Analyzer and doing column analysis ..... where might be the japanese characters are not being read properly and giving the wrong analysis result .

What exactly needs to be done ?

ray.wurlod · Post by **ray.wurlod** » Thu Jan 29, 2009 11:04 pm

sweta rai wrote:for field SUPNM. When moving data from source field type STRING(min=0,max=135,charset=windows-1252)

This tells me that either the source table has not been loaded properly or that your metadata are inconsistent.

sweta rai · Post by **sweta rai** » Thu Jan 29, 2009 11:33 pm

My apologies ...

Let me try to clear it to you :

The source oracle table has the Japanese data correctly populated .
But when we are importing that meta-data in the IA Repository considering as its source ; its not importing those data correctly and converting it to charset=windows-1252 .

Although the IA server at its end has its encoding ( UTF - 8 ) done correctly at the target ..thts why the message "into target field type STRING(min=0,max=135,charset=UTF-8 )"

Correct me if i'm wrong and help me in this regard .

ray.wurlod · Post by **ray.wurlod** » Fri Jan 30, 2009 12:55 am

Report the metadata import bug to your support provider, then change the setting within IA to UTF-8 manually.

sweta rai · Post by **sweta rai** » Fri Jan 30, 2009 2:19 am

Hi ray ,
Thanks for your effort .

We got the solution .

We need to add the following two parameters in the DataStage Administrator :

1.NLS_LANG and set its value to AMERICAN_AMERICA.UTF8
2.DB2CODEPAGE and set its value to 1208

ray.wurlod · Post by **ray.wurlod** » Fri Jan 30, 2009 2:50 am

I had assumed that NLS_LANG was OK because you could access Japanese data in other tables satisfactorily.

You did not mention DB2 in your original question, so I assume that DB2CODEPAGE relates to your common metadata repository (XMETA) and its associated services or that your IA database (IADB) is DB2.

sweta rai · Post by **sweta rai** » Fri Jan 30, 2009 4:57 am

Yes , because the default analysis database IADB for IIS is DB2 and we need to explicitly mention its codepage .

DSXchange

Error Profiling Japanese Character

Error Profiling Japanese Character

Re: Error Profiling Japanese Character