Converting UTF8 to ASCII file and back to UTF8
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 6
- Joined: Thu Jun 10, 2004 3:32 pm
Converting UTF8 to ASCII file and back to UTF8
Hi,
I have one requirement where I need to convert UTF8 file to ASCII file and then convert it back to UTF8.
Could any one pelase help me how to do this using Datastage.
Regards
BRP
I have one requirement where I need to convert UTF8 file to ASCII file and then convert it back to UTF8.
Could any one pelase help me how to do this using Datastage.
Regards
BRP
This can be nicely done by writing a Python script to handle the decoding and encoding of the data. DataStage is a data migration tool and is not really designed for this type of encoding/decoding. By writing a script in Python you can ensure that it is platform neutral.
See:
http://www.opendocspublishing.com/pyqt/x2183.htm
and
http://pydoc.org/2.1/encodings.utf_8.html
Best of luck.
See:
http://www.opendocspublishing.com/pyqt/x2183.htm
and
http://pydoc.org/2.1/encodings.utf_8.html
Best of luck.
-
- Participant
- Posts: 6
- Joined: Thu Jun 10, 2004 3:32 pm
Hi,
Can you please expalin me how do I integrate scripts with Datastage.
Do I need run seperate scripts and convert them and use them in my DAtastage.
Regards
Prem
Can you please expalin me how do I integrate scripts with Datastage.
Do I need run seperate scripts and convert them and use them in my DAtastage.
Regards
Prem
1stpoint wrote:This can be nicely done by writing a Python script to handle the decoding and encoding of the data. DataStage is a data migration tool and is not really designed for this type of encoding/decoding. By writing a script in Python you can ensure that it is platform neutral.
See:
http://www.opendocspublishing.com/pyqt/x2183.htm
and
http://pydoc.org/2.1/encodings.utf_8.html
Best of luck.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Do you have NLS (National Language Support) enabled in DataStage?
If so you can use mapping on the inputs and outputs. Internally, if NLS is enabled, DataStage uses an idiosyncratic UTF-8 encoding of Unicode.
If so you can use mapping on the inputs and outputs. Internally, if NLS is enabled, DataStage uses an idiosyncratic UTF-8 encoding of Unicode.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 6
- Joined: Thu Jun 10, 2004 3:32 pm
UTF8
If I use NLS MAP the UTF8 with file which contains Japnese charracters as Input. then in my output of EBCDIC file the characters are replaced by " ?" symbols.
Regards,
Prem
Regards,
Prem
ray.wurlod wrote:Do you have NLS (National Language Support) enabled in DataStage?
If so you can use mapping on the inputs and outputs. Internally, if NLS is enabled, DataStage uses an idiosyncratic UTF-8 encoding of Unicode.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
If you have Japanese characters in the input you will need the correct Japanese map to translate them when reading the file. There are many different encodings of Japanese characters; sometimes we even find that different columns are encoded differently, or that the map changes during a data stream (triggered by shift-in/shift-out characters).
There is no guarantee that using a different map when writing will magically "translate" the characters into a different encoding. Not all characters are represented in every encoding.
DataStage is not intended as a translation tool.
There is no guarantee that using a different map when writing will magically "translate" the characters into a different encoding. Not all characters are represented in every encoding.
DataStage is not intended as a translation tool.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 6
- Joined: Thu Jun 10, 2004 3:32 pm
Hi Ray,
Could you please pass on some examples if you have.
Regards,
Prem
Could you please pass on some examples if you have.
Regards,
Prem
ray.wurlod wrote:If you have Japanese characters in the input you will need the correct Japanese map to translate them when reading the file. There are many different encodings of Japanese characters; sometimes we even find that different columns are encoded differently, or that the map changes during a data stream (triggered by shift-in/shift-out characters).
There is no guarantee that using a different map when writing will magically "translate" the characters into a different encoding. Not all characters are represented in every encoding.
DataStage is not intended as a translation tool.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
I have none. I am not working with Japanese data at the moment.
You might like to ask Ascential support - through your support provider, of course.
It is always a problem to be certain about how Japanese data are encoded. It is rare that the data owner knows for sure. Take a look at the drop-down list of possible mappings to see what I mean.
You might like to ask Ascential support - through your support provider, of course.
It is always a problem to be certain about how Japanese data are encoded. It is rare that the data owner knows for sure. Take a look at the drop-down list of possible mappings to see what I mean.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
solution
We have had this problem in the past and Python will accurately encode and decode the Japanese UTF-8 characters. This is done in a pre-load process either called by a batch or unix shell script. The link above actually has a working UTF8 conversion program and how to implement it.