How to handle Chinese characters

Jag · Post by **Jag** » Tue Sep 18, 2012 7:11 am

I am receiving Chinese Characters in a string and I need them to be replaced with space

Is there any way we can do it in datastage 8.1
File contains a string 延安东路555号 and i need this as output '555'

Thanks in Advance

ArndW · Post by **ArndW** » Tue Sep 18, 2012 7:48 am

Before attempting a solution, you need to define exactly what you mean by "Chinese Characters". The different chinese character sets contain not only ideographs but also the classic LATIN-1 and extended ASCII characters. So first you need to know exactly which characters you want to keep.
Here's a possibility - read in the data as simple non-UTF ASCII using a DataStage server job. Use the ICONV(Data.Column,"MCP") to convert non-printable characters to ".". Is that sufficient to your requirements?

ray.wurlod · Post by **ray.wurlod** » Tue Sep 18, 2012 3:27 pm

If you have NLS enabled, then 延, 安, 东, 路, 5 and 号 are all printable characters.

Perhaps you can use a server job, or a server Transformer stage in a server shared container, or a BASIC Transformer stage, with a routine that checks the result of the UniSeq() function applied to each character in turn. If the value of UniSeq() is higher than 255 then you have a non-ASCII character - that's over-simplified but addresses your requirement.