Page 1 of 1

junk character - how to identify

Posted: Thu May 29, 2008 4:03 am
by tsn
How to identify the junk (non-reable or non-printable) characters in a record, I need to find it out in process of loading the records from source to target. is there any function for that.


tks.

Posted: Thu May 29, 2008 5:12 am
by ArndW
Yes, you can use OCONV(In.MessyString,'MCP') to turn all those nasty characters into "."

Posted: Thu May 29, 2008 5:41 am
by ray.wurlod
You can use Oconv(InLink.TheString, "MX0C") to convert all characters to hex-encoded equivalents.

Posted: Mon Jun 02, 2008 12:58 am
by tsn
I hope that is not a jung characters, as i said earlier, it is not readable characters. example, if the thai characters coming in the name attribute, without using NLS, you can read those characters, so how we can identify those characters which are not undertandable by english one.

Posted: Mon Jun 02, 2008 1:04 am
by ray.wurlod
You can use Oconv(InLink.TheString, "MX0C") to convert all characters to hex-encoded equivalents.

Posted: Mon Jun 02, 2008 1:14 am
by tsn
Ok Tks. how will i count how many characters come in a record and also in an attribute. Assume if hexadecimal value itself is coming, how will you conert that and count those.

Posted: Mon Jun 02, 2008 2:22 am
by ArndW
It will be displayed as 2 bytes per character unless you are using NLS

Posted: Mon Jun 02, 2008 2:31 am
by tsn
As i said earlier, we are not using NLS. in the name attribute if those characters are coming then it is a problem.

example - NAME - VARCHAR(50).

If this non readable character is coming in the name attribute then the requirement here is to read and count how many characters are coming as non-readable character in a string(NAME), so that the warning or job abort can be avoided. so how to handle this one

Posted: Mon Jun 02, 2008 2:35 am
by ArndW
Why not just use LEN({string}) before doing the conversion to hex?

Posted: Mon Jun 02, 2008 5:54 am
by ray.wurlod
The "MC0X" conversion returns the code for each byte as two hexadecimal digits. For example "A" (whose ASCII code is 65) will be converted to "41".

Posted: Tue Jun 03, 2008 2:03 am
by tsn
here is the situation, take is as an example.

the NAME is an attribute and its datatype is VARCHAR(50).

from the source file for few records under the name attribute thai characters are coming, since we are not using NLS, it will be consider as two byte which will try to get instered into table under NAME attribute.

If the thai characters are coming 10 then it will store into table with 20 character size. instead of 10 if it comes with 30 thai characters, then it will try to get stored as 60 character size. It will give warning in datastage. project has 0 warning settings. so the job will get aborted.

Posted: Tue Jun 03, 2008 6:01 am
by ray.wurlod
Not necessarily. In many cases Thai characters can be encoded in a single-byte character set. TIS620 (the "standard" Thai character set encoding on UNIX) is one example.

Posted: Tue Jun 03, 2008 6:08 am
by chulett
So, in other words you have no 'junk character' issue but rather an issue handling multi-byte characters?