Error Profiling Japanese Character
Error Profiling Japanese Character
Hi ,
I am trying to do a column analysis on a column which is having Japanese character in it . But it is not able to profile those data and giving a warning like :
pxbridge: [IIS-CONN-DAAPI-000067] Schema reconciliation detected a size mismatch for field SUPNM. When moving data from source field type STRING(min=0,max=135,charset=windows-1252) into target field type STRING(min=0,max=135,charset=UTF-8 ), truncation, loss of precision or data corruption can occur. Use STRING(min=0,max=405,charset=UTF-8) for target type
The Data source is Oracle database where the UTF-8 setting is done so the japanese characters are correctly populated there .
The driver which we are using is "IBM Oracle Wire Protocol".
What i could understand from the above warning is that the driver which we are using is not able to read UTF-8 data and its default charset is windows-1252.
Do i need to use other driver to connect to the oracle data source or something else needs to be done ??
Kindly somebody suggest . I'm sort of stuck in this problem .
I am trying to do a column analysis on a column which is having Japanese character in it . But it is not able to profile those data and giving a warning like :
pxbridge: [IIS-CONN-DAAPI-000067] Schema reconciliation detected a size mismatch for field SUPNM. When moving data from source field type STRING(min=0,max=135,charset=windows-1252) into target field type STRING(min=0,max=135,charset=UTF-8 ), truncation, loss of precision or data corruption can occur. Use STRING(min=0,max=405,charset=UTF-8) for target type
The Data source is Oracle database where the UTF-8 setting is done so the japanese characters are correctly populated there .
The driver which we are using is "IBM Oracle Wire Protocol".
What i could understand from the above warning is that the driver which we are using is not able to read UTF-8 data and its default charset is windows-1252.
Do i need to use other driver to connect to the oracle data source or something else needs to be done ??
Kindly somebody suggest . I'm sort of stuck in this problem .
Sweta
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Code page 1252 (US English) will not handle Japanese characters. You need to find out how the Japanese characters are encoded and specify the same mapping for DataStage to use. In this case, and provided that the source is encoded using UTF-8, you need to vary the source metadata.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
None of that is relevant until you can successfully READ the Japanese characters. You need to establish how these are encoded in your source table, and set the mapping in DataStage to correspond to that. Based on your opening question, the target already uses UTF-8, but I'd check that anyway.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Ray , I'm afraid i cud not get you what exactly you want to say ....
The Datastage job which populated that oracle table from source file has all encoding done correctly .
So , the data in the oracle table are stored correctly and we are able to read the Japanese data properly in the table.
Now For profiling ..we do not need to design any job . We just imported and binded that oracle data source in information Analyzer and doing column analysis ..... where might be the japanese characters are not being read properly and giving the wrong analysis result .
What exactly needs to be done ?
The Datastage job which populated that oracle table from source file has all encoding done correctly .
So , the data in the oracle table are stored correctly and we are able to read the Japanese data properly in the table.
Now For profiling ..we do not need to design any job . We just imported and binded that oracle data source in information Analyzer and doing column analysis ..... where might be the japanese characters are not being read properly and giving the wrong analysis result .
What exactly needs to be done ?
Sweta
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Re: Error Profiling Japanese Character
This tells me that either the source table has not been loaded properly or that your metadata are inconsistent.sweta rai wrote:for field SUPNM. When moving data from source field type STRING(min=0,max=135,charset=windows-1252)
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
My apologies ...
Let me try to clear it to you :
The source oracle table has the Japanese data correctly populated .
But when we are importing that meta-data in the IA Repository considering as its source ; its not importing those data correctly and converting it to charset=windows-1252 .
Although the IA server at its end has its encoding ( UTF - 8 ) done correctly at the target ..thts why the message "into target field type STRING(min=0,max=135,charset=UTF-8 )"
Correct me if i'm wrong and help me in this regard .
Let me try to clear it to you :
The source oracle table has the Japanese data correctly populated .
But when we are importing that meta-data in the IA Repository considering as its source ; its not importing those data correctly and converting it to charset=windows-1252 .
Although the IA server at its end has its encoding ( UTF - 8 ) done correctly at the target ..thts why the message "into target field type STRING(min=0,max=135,charset=UTF-8 )"
Correct me if i'm wrong and help me in this regard .
Sweta
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
I had assumed that NLS_LANG was OK because you could access Japanese data in other tables satisfactorily.
You did not mention DB2 in your original question, so I assume that DB2CODEPAGE relates to your common metadata repository (XMETA) and its associated services or that your IA database (IADB) is DB2.
You did not mention DB2 in your original question, so I assume that DB2CODEPAGE relates to your common metadata repository (XMETA) and its associated services or that your IA database (IADB) is DB2.
Last edited by ray.wurlod on Fri Jan 30, 2009 2:24 pm, edited 1 time in total.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.