Error Profiling Japanese Character

This forum contains ProfileStage posts and now focuses at newer versions Infosphere Information Analyzer.

Moderators: chulett, rschirm

Post Reply
sweta rai
Participant
Posts: 14
Joined: Tue Apr 01, 2008 6:56 am
Location: kolkata

Error Profiling Japanese Character

Post by sweta rai »

Hi ,

I am trying to do a column analysis on a column which is having Japanese character in it . But it is not able to profile those data and giving a warning like :

pxbridge: [IIS-CONN-DAAPI-000067] Schema reconciliation detected a size mismatch for field SUPNM. When moving data from source field type STRING(min=0,max=135,charset=windows-1252) into target field type STRING(min=0,max=135,charset=UTF-8 ), truncation, loss of precision or data corruption can occur. Use STRING(min=0,max=405,charset=UTF-8) for target type



The Data source is Oracle database where the UTF-8 setting is done so the japanese characters are correctly populated there .

The driver which we are using is "IBM Oracle Wire Protocol".

What i could understand from the above warning is that the driver which we are using is not able to read UTF-8 data and its default charset is windows-1252.

Do i need to use other driver to connect to the oracle data source or something else needs to be done ??

Kindly somebody suggest . I'm sort of stuck in this problem .
Sweta
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Code page 1252 (US English) will not handle Japanese characters. You need to find out how the Japanese characters are encoded and specify the same mapping for DataStage to use. In this case, and provided that the source is encoded using UTF-8, you need to vary the source metadata.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
sweta rai
Participant
Posts: 14
Joined: Tue Apr 01, 2008 6:56 am
Location: kolkata

Post by sweta rai »

Hi ray ,

Do i need to set the code page for Japanese character while loading the data to Oracle Table ?

OR

Do i need to set the NLS for Japanese character in the IIS Administrator ?Earlier it was set to UTF-8.

Please clarify me.
Sweta
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

None of that is relevant until you can successfully READ the Japanese characters. You need to establish how these are encoded in your source table, and set the mapping in DataStage to correspond to that. Based on your opening question, the target already uses UTF-8, but I'd check that anyway.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
sweta rai
Participant
Posts: 14
Joined: Tue Apr 01, 2008 6:56 am
Location: kolkata

Post by sweta rai »

Ray , I'm afraid i cud not get you what exactly you want to say ....

The Datastage job which populated that oracle table from source file has all encoding done correctly .
So , the data in the oracle table are stored correctly and we are able to read the Japanese data properly in the table.

Now For profiling ..we do not need to design any job . We just imported and binded that oracle data source in information Analyzer and doing column analysis ..... where might be the japanese characters are not being read properly and giving the wrong analysis result .

What exactly needs to be done ?
Sweta
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Re: Error Profiling Japanese Character

Post by ray.wurlod »

sweta rai wrote:for field SUPNM. When moving data from source field type STRING(min=0,max=135,charset=windows-1252)
This tells me that either the source table has not been loaded properly or that your metadata are inconsistent.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
sweta rai
Participant
Posts: 14
Joined: Tue Apr 01, 2008 6:56 am
Location: kolkata

Post by sweta rai »

My apologies ...

Let me try to clear it to you :

The source oracle table has the Japanese data correctly populated .
But when we are importing that meta-data in the IA Repository considering as its source ; its not importing those data correctly and converting it to charset=windows-1252 .

Although the IA server at its end has its encoding ( UTF - 8 ) done correctly at the target ..thts why the message "into target field type STRING(min=0,max=135,charset=UTF-8 )"

Correct me if i'm wrong and help me in this regard .
Sweta
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Report the metadata import bug to your support provider, then change the setting within IA to UTF-8 manually.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
sweta rai
Participant
Posts: 14
Joined: Tue Apr 01, 2008 6:56 am
Location: kolkata

Post by sweta rai »

Hi ray ,
Thanks for your effort .

We got the solution .

We need to add the following two parameters in the DataStage Administrator :

1.NLS_LANG and set its value to AMERICAN_AMERICA.UTF8
2.DB2CODEPAGE and set its value to 1208
Sweta
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I had assumed that NLS_LANG was OK because you could access Japanese data in other tables satisfactorily.

You did not mention DB2 in your original question, so I assume that DB2CODEPAGE relates to your common metadata repository (XMETA) and its associated services or that your IA database (IADB) is DB2.
Last edited by ray.wurlod on Fri Jan 30, 2009 2:24 pm, edited 1 time in total.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
sweta rai
Participant
Posts: 14
Joined: Tue Apr 01, 2008 6:56 am
Location: kolkata

Post by sweta rai »

Yes , because the default analysis database IADB for IIS is DB2 and we need to explicitly mention its codepage .
Sweta
Post Reply