Convert data from LATIN1 to UTF8

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
nasimul
Participant
Posts: 37
Joined: Wed Jan 25, 2006 1:38 am

Convert data from LATIN1 to UTF8

Post by nasimul »

Hi,
Our requirement is to ocnvert data from LATIN1 to UTF8 format? How can it be implemented throgh DataSage?
Please let me know if we have any other way to approach this problem?

Thanks,
Nasimul
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

This conversion can be done automatically in DataStage if you have NLS installed, just declare your input as LATIN1 and the output as UTF8
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

This conversion can be done automatically in DataStage if you have NLS installed, just declare your input as LATIN1 and the output as UTF8
nasimul
Participant
Posts: 37
Joined: Wed Jan 25, 2006 1:38 am

Re: Convert data from LATIN1 to UTF8

Post by nasimul »

Hi,
Is there any other way to convert LATIN1 format to UTF8?
Actually in datastage NLS is not enabled.
Please let me know any other solution.

Thanks,
Nasimul
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

How do you expect to manipulate character sets if the tool for doing so is disabled?

The answer is no.

You can try using NVarChar as a data type but no guarantees are made. It may or may not work, when NLS is disabled.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
nasimul
Participant
Posts: 37
Joined: Wed Jan 25, 2006 1:38 am

Post by nasimul »

Hi,

I want to know is there any DB2 function to convert LATIN1 format to UTF8 format?
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

No, DB2 does not have a function to convert between character sets.
jdmiceli
Premium Member
Premium Member
Posts: 309
Joined: Wed Feb 22, 2006 10:03 am
Location: Urbandale, IA

Post by jdmiceli »

If you aren't using NLS for whatever reason, the safest way would be to extract the data from LATIN-1 source and dump the data to a sequential file on a UTF-8 box. I think this will create the file in the codepage format you want and then you could bulk load the file to DB2. It is a one-off fix, but I think it will work.
Bestest!

John Miceli
System Specialist, MCP, MCDBA
Berkley Technology Services


"Good Morning. This is God. I will be handling all your problems today. I will not need your help. So have a great day!"
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

If you do a "diff" on the 2 files are they identical?
ankita
Participant
Posts: 57
Joined: Sun Nov 13, 2005 11:17 pm

Post by ankita »

Yes, we have tried diff, but it's not showing any difference other than 2 rejected records. This rejection is due to business logic, which is fine.

Can anyone give any suggestion ?

Thanks,
Ankita
hello105
Participant
Posts: 9
Joined: Mon Sep 03, 2007 9:53 pm
Location: ShangHai,China

Re: Convert data from LATIN1 to UTF8

Post by hello105 »

in the unix platform,there is some command you can invoke directly to transfer a latin1 file to uat-8 file
nasimul wrote:Hi,
Our requirement is to ocnvert data from LATIN1 to UTF8 format? How can it be implemented throgh DataSage?
Please let me know if we have any other way to approach this problem?

Thanks,
Nasimul
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

The only conversion command on UNIX that I know of is "dd", which can perform ASCII to EBCDIC and vice-versa.
If declare your source file as 8859 and output to UTF-8 and a "diff" command shows no difference then you have done something wrong at your settings since Datastage wil have done some conversion. I suspect your source stage settings aren't what you think.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Actually there is a command, I'd not heard of it until someone else mentioned it here some time ago. And I tend to forget about it because of the name - iconv. :wink:
man iconv wrote:iconv -f fromcode -t tocode [file ...]

iconv converts the encoding of characters in the input file from the fromcode code set to the tocode code set, and writes the results to standard output.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Cool!
ankita
Participant
Posts: 57
Joined: Sun Nov 13, 2005 11:17 pm

Post by ankita »

Hi All,

Thanks for these unix tips !
But in our project we have NLS installed, so would like to do this conversion (Latin 1 to UTF-8) through Data Stage V 8.0.
We have set the job level NLS map to ISO-8859-1 and able to read Latin 1 chars properly.
Job looks like:
Source Seq file - > Transformer - > Target Seq file

Other properties are,
Default collation locale for stages : Project(OFF) at job level
NLS Locale at transformer : Project(OFF)
Target Seq file NLS : UTF-8.

But finally job is not converting the Latin 1 chars to UTF-8 at output file. All latin chars are passed as it's. Can you please provide suggestions to solve this ?

Thanks,
Ankita
Post Reply