Converting UTF8 to ASCII file and back to UTF8

premreddyb · Post by **premreddyb** » Fri Jun 11, 2004 11:37 am

Hi,

I have one requirement where I need to convert UTF8 file to ASCII file and then convert it back to UTF8.

Could any one pelase help me how to do this using Datastage.

Regards
BRP

1stpoint · Post by **1stpoint** » Fri Jun 11, 2004 12:17 pm

This can be nicely done by writing a Python script to handle the decoding and encoding of the data. DataStage is a data migration tool and is not really designed for this type of encoding/decoding. By writing a script in Python you can ensure that it is platform neutral.

See:
http://www.opendocspublishing.com/pyqt/x2183.htm

and

http://pydoc.org/2.1/encodings.utf_8.html

Best of luck.

premreddyb · Post by **premreddyb** » Fri Jun 11, 2004 3:09 pm

Hi,
Can you please expalin me how do I integrate scripts with Datastage.
Do I need run seperate scripts and convert them and use them in my DAtastage.

Regards
Prem

1stpoint wrote:This can be nicely done by writing a Python script to handle the decoding and encoding of the data. DataStage is a data migration tool and is not really designed for this type of encoding/decoding. By writing a script in Python you can ensure that it is platform neutral.

See:
http://www.opendocspublishing.com/pyqt/x2183.htm

and

http://pydoc.org/2.1/encodings.utf_8.html

Best of luck.

ray.wurlod · Post by **ray.wurlod** » Fri Jun 11, 2004 7:39 pm

Do you have NLS (National Language Support) enabled in DataStage?
If so you can use mapping on the inputs and outputs. Internally, if NLS is enabled, DataStage uses an idiosyncratic UTF-8 encoding of Unicode.

jwhyman · Post by **jwhyman** » Mon Jun 14, 2004 4:07 am

There is no need to convert from ASCII to UTF8 by definition ASCII and UTF8 are invariant. 0x00-0x7F is encoded as 0x00-0x7F. This is the reason why it is used.

premreddyb · Post by **premreddyb** » Thu Jun 17, 2004 12:23 pm

If I use NLS MAP the UTF8 with file which contains Japnese charracters as Input. then in my output of EBCDIC file the characters are replaced by " ?" symbols.

Regards,
Prem

ray.wurlod wrote:Do you have NLS (National Language Support) enabled in DataStage?
If so you can use mapping on the inputs and outputs. Internally, if NLS is enabled, DataStage uses an idiosyncratic UTF-8 encoding of Unicode.

ray.wurlod · Post by **ray.wurlod** » Thu Jun 17, 2004 4:36 pm

If you have Japanese characters in the input you will need the correct Japanese map to translate them when reading the file. There are many different encodings of Japanese characters; sometimes we even find that different columns are encoded differently, or that the map changes during a data stream (triggered by shift-in/shift-out characters).

There is no guarantee that using a different map when writing will magically "translate" the characters into a different encoding. Not all characters are represented in every encoding.

DataStage is not intended as a translation tool.

premreddyb · Post by **premreddyb** » Thu Jun 17, 2004 4:42 pm

Hi Ray,

Could you please pass on some examples if you have.

Regards,
Prem

ray.wurlod wrote:If you have Japanese characters in the input you will need the correct Japanese map to translate them when reading the file. There are many different encodings of Japanese characters; sometimes we even find that different columns are encoded differently, or that the map changes during a data stream (triggered by shift-in/shift-out characters).

There is no guarantee that using a different map when writing will magically "translate" the characters into a different encoding. Not all characters are represented in every encoding.

DataStage is not intended as a translation tool.

ray.wurlod · Post by **ray.wurlod** » Thu Jun 17, 2004 4:55 pm

I have none. I am not working with Japanese data at the moment.
You might like to ask Ascential support - through your support provider, of course.

It is always a problem to be certain about how Japanese data are encoded. It is rare that the data owner knows for sure. Take a look at the drop-down list of possible mappings to see what I mean.

1stpoint · Post by **1stpoint** » Fri Jun 18, 2004 6:49 am

We have had this problem in the past and Python will accurately encode and decode the Japanese UTF-8 characters. This is done in a pre-load process either called by a batch or unix shell script. The link above actually has a working UTF8 conversion program and how to implement it.

DSXchange

Converting UTF8 to ASCII file and back to UTF8

Converting UTF8 to ASCII file and back to UTF8

UTF8

solution