Page 1 of 1

Convert data from LATIN1 to UTF8

Posted: Thu May 15, 2008 1:03 am
by nasimul
Hi,
Our requirement is to ocnvert data from LATIN1 to UTF8 format? How can it be implemented throgh DataSage?
Please let me know if we have any other way to approach this problem?

Thanks,
Nasimul

Posted: Thu May 15, 2008 1:39 am
by ArndW
This conversion can be done automatically in DataStage if you have NLS installed, just declare your input as LATIN1 and the output as UTF8

Posted: Thu May 15, 2008 2:17 am
by ArndW
This conversion can be done automatically in DataStage if you have NLS installed, just declare your input as LATIN1 and the output as UTF8

Re: Convert data from LATIN1 to UTF8

Posted: Thu May 15, 2008 5:07 am
by nasimul
Hi,
Is there any other way to convert LATIN1 format to UTF8?
Actually in datastage NLS is not enabled.
Please let me know any other solution.

Thanks,
Nasimul

Posted: Fri May 16, 2008 6:38 am
by ray.wurlod
How do you expect to manipulate character sets if the tool for doing so is disabled?

The answer is no.

You can try using NVarChar as a data type but no guarantees are made. It may or may not work, when NLS is disabled.

Posted: Mon May 19, 2008 7:58 am
by nasimul
Hi,

I want to know is there any DB2 function to convert LATIN1 format to UTF8 format?

Posted: Mon May 19, 2008 9:00 am
by ArndW
No, DB2 does not have a function to convert between character sets.

Posted: Mon May 19, 2008 11:19 am
by jdmiceli
If you aren't using NLS for whatever reason, the safest way would be to extract the data from LATIN-1 source and dump the data to a sequential file on a UTF-8 box. I think this will create the file in the codepage format you want and then you could bulk load the file to DB2. It is a one-off fix, but I think it will work.

Posted: Wed Jun 04, 2008 1:31 am
by ArndW
If you do a "diff" on the 2 files are they identical?

Posted: Fri Jun 06, 2008 12:40 am
by ankita
Yes, we have tried diff, but it's not showing any difference other than 2 rejected records. This rejection is due to business logic, which is fine.

Can anyone give any suggestion ?

Thanks,
Ankita

Re: Convert data from LATIN1 to UTF8

Posted: Sun Jun 08, 2008 2:48 am
by hello105
in the unix platform,there is some command you can invoke directly to transfer a latin1 file to uat-8 file
nasimul wrote:Hi,
Our requirement is to ocnvert data from LATIN1 to UTF8 format? How can it be implemented throgh DataSage?
Please let me know if we have any other way to approach this problem?

Thanks,
Nasimul

Posted: Sun Jun 08, 2008 7:20 am
by ArndW
The only conversion command on UNIX that I know of is "dd", which can perform ASCII to EBCDIC and vice-versa.
If declare your source file as 8859 and output to UTF-8 and a "diff" command shows no difference then you have done something wrong at your settings since Datastage wil have done some conversion. I suspect your source stage settings aren't what you think.

Posted: Sun Jun 08, 2008 7:54 am
by chulett
Actually there is a command, I'd not heard of it until someone else mentioned it here some time ago. And I tend to forget about it because of the name - iconv. :wink:
man iconv wrote:iconv -f fromcode -t tocode [file ...]

iconv converts the encoding of characters in the input file from the fromcode code set to the tocode code set, and writes the results to standard output.

Posted: Sun Jun 08, 2008 8:01 am
by ArndW
Cool!

Posted: Wed Jul 02, 2008 3:45 am
by ankita
Hi All,

Thanks for these unix tips !
But in our project we have NLS installed, so would like to do this conversion (Latin 1 to UTF-8) through Data Stage V 8.0.
We have set the job level NLS map to ISO-8859-1 and able to read Latin 1 chars properly.
Job looks like:
Source Seq file - > Transformer - > Target Seq file

Other properties are,
Default collation locale for stages : Project(OFF) at job level
NLS Locale at transformer : Project(OFF)
Target Seq file NLS : UTF-8.

But finally job is not converting the Latin 1 chars to UTF-8 at output file. All latin chars are passed as it's. Can you please provide suggestions to solve this ?

Thanks,
Ankita