junk character - how to identify

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply
tsn
Participant
Posts: 51
Joined: Wed Jan 10, 2007 1:32 am

junk character - how to identify

Post by tsn »

How to identify the junk (non-reable or non-printable) characters in a record, I need to find it out in process of loading the records from source to target. is there any function for that.


tks.
with regards,
tsn
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Yes, you can use OCONV(In.MessyString,'MCP') to turn all those nasty characters into "."
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You can use Oconv(InLink.TheString, "MX0C") to convert all characters to hex-encoded equivalents.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
tsn
Participant
Posts: 51
Joined: Wed Jan 10, 2007 1:32 am

Post by tsn »

I hope that is not a jung characters, as i said earlier, it is not readable characters. example, if the thai characters coming in the name attribute, without using NLS, you can read those characters, so how we can identify those characters which are not undertandable by english one.
with regards,
tsn
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You can use Oconv(InLink.TheString, "MX0C") to convert all characters to hex-encoded equivalents.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
tsn
Participant
Posts: 51
Joined: Wed Jan 10, 2007 1:32 am

Post by tsn »

Ok Tks. how will i count how many characters come in a record and also in an attribute. Assume if hexadecimal value itself is coming, how will you conert that and count those.
with regards,
tsn
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

It will be displayed as 2 bytes per character unless you are using NLS
tsn
Participant
Posts: 51
Joined: Wed Jan 10, 2007 1:32 am

Post by tsn »

As i said earlier, we are not using NLS. in the name attribute if those characters are coming then it is a problem.

example - NAME - VARCHAR(50).

If this non readable character is coming in the name attribute then the requirement here is to read and count how many characters are coming as non-readable character in a string(NAME), so that the warning or job abort can be avoided. so how to handle this one
with regards,
tsn
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Why not just use LEN({string}) before doing the conversion to hex?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The "MC0X" conversion returns the code for each byte as two hexadecimal digits. For example "A" (whose ASCII code is 65) will be converted to "41".
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
tsn
Participant
Posts: 51
Joined: Wed Jan 10, 2007 1:32 am

Post by tsn »

here is the situation, take is as an example.

the NAME is an attribute and its datatype is VARCHAR(50).

from the source file for few records under the name attribute thai characters are coming, since we are not using NLS, it will be consider as two byte which will try to get instered into table under NAME attribute.

If the thai characters are coming 10 then it will store into table with 20 character size. instead of 10 if it comes with 30 thai characters, then it will try to get stored as 60 character size. It will give warning in datastage. project has 0 warning settings. so the job will get aborted.
with regards,
tsn
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Not necessarily. In many cases Thai characters can be encoded in a single-byte character set. TIS620 (the "standard" Thai character set encoding on UNIX) is one example.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

So, in other words you have no 'junk character' issue but rather an issue handling multi-byte characters?
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply