How to handle Chinese characters

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Jag
Premium Member
Premium Member
Posts: 4
Joined: Tue Sep 15, 2009 8:38 am

How to handle Chinese characters

Post by Jag »

I am receiving Chinese Characters in a string and I need them to be replaced with space

Is there any way we can do it in datastage 8.1
File contains a string 延安东路555号 and i need this as output '555'

Thanks in Advance
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Before attempting a solution, you need to define exactly what you mean by "Chinese Characters". The different chinese character sets contain not only ideographs but also the classic LATIN-1 and extended ASCII characters. So first you need to know exactly which characters you want to keep.
Here's a possibility - read in the data as simple non-UTF ASCII using a DataStage server job. Use the ICONV(Data.Column,"MCP") to convert non-printable characters to ".". Is that sufficient to your requirements?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

If you have NLS enabled, then 延, 安, 东, 路, 5 and 号 are all printable characters.

Perhaps you can use a server job, or a server Transformer stage in a server shared container, or a BASIC Transformer stage, with a routine that checks the result of the UniSeq() function applied to each character in turn. If the value of UniSeq() is higher than 255 then you have a non-ASCII character - that's over-simplified but addresses your requirement.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply