Page 1 of 2

junk character removal

Posted: Thu Aug 07, 2014 7:42 am
by altruist
Hi

I am trying to remove all junk characters from source data on every field. To achive this I am removing Characters 0-8, 10-31, 127-255 using convert function:

Code: Select all

Convert(Char(0):Char(1):Char(2).........etc,"",InputField)
But I see that from some of the fields the spaces are getting removed. I tried to check if there were any junk characters, but didn't find any in them using

Code: Select all

"echo "Field Value (Copied and Pasted)" | cat -v"
and

Code: Select all

"echo "Field Value (Copied and Pasted)" | od -c"
But didn't notice anything unusual characters in them. Basically my code is removign all spaces between the characters.

Posted: Thu Aug 07, 2014 12:16 pm
by chulett
First off, there are no "junk" characters... but I'll save that lecture for others.

We'd probably need to see your complete derivation to be able to help, something without the "etc" in it. And I'm also curious if you are doing any other derivations on the strings post-convert or if the convert is literally all you are doing.

Posted: Thu Aug 07, 2014 2:00 pm
by ray.wurlod
What do you get if you don't apply the Convert() function, since you assert that there are no "junk" characters present in the data?

Perhaps a better term would be non-alphanumeric.

Posted: Sun Aug 10, 2014 11:00 pm
by altruist

Code: Select all

Convert(Char(0):Char(1):Char(2):Char(3):Char(4):Char(5):Char(6):Char(7):Char(8):Char(9):Char(10):Char(11):Char(12):Char(13):Char(14):Char(15):Char(16):Char(17):Char(18):Char(19):Char(20):Char(21):Char(22):Char(23):Char(24):Char(25):Char(26):Char(27):Char(28):Char(29):Char(30):Char(31):Char(127):Char(128):Char(129):Char(130):Char(131):Char(132):Char(133):Char(134)........Char(255),"",TrimLeadingTrailing(NullToEmpty(InputField)))
This is removing spaces between two values. Eg. "Test1 Test2", I am getting the output as "Test1Test2"

Posted: Sun Aug 10, 2014 11:57 pm
by altruist
While debugging I found that the issue is occurring when I am trying to remove extended ascii characters i.e starting values 128 till 255.

Do we have to use any other function for such values ?

Posted: Mon Aug 11, 2014 3:26 am
by ray.wurlod
You may need a second option on the Char() function to specify these characters correctly. It asserts whether or not the most significant bit is on.
For example:

Code: Select all

Char(164, @TRUE)

Posted: Wed Aug 13, 2014 1:19 am
by altruist
Hi Ray,

I am using Datastage 8.1, looks like there is no second option in the char() function.

Posted: Wed Aug 13, 2014 4:11 pm
by ray.wurlod
Did you actually try it? It's not a documented argument.

Posted: Thu Aug 14, 2014 7:28 am
by altruist
Hi Ray,

I did try it Ray, but the derivation field was showing up as not valid.

Posted: Thu Aug 14, 2014 4:15 pm
by ray.wurlod
That shows only that the expression editor parser doesn't like it. Does it compile and work properly?

Posted: Sat Aug 16, 2014 10:37 am
by altruist
Hi Ray,

I am unable to compile as well in 8.1

Posted: Sun Aug 17, 2014 4:54 am
by ray.wurlod
OK, maybe it only works in BASIC-based components, such as the BASIC Transformer stage.

Posted: Sat Aug 30, 2014 1:43 am
by altruist
Hi Ray,

Is there any way to remove those Extended EBCDIC character, since char(164,@TRUE), cannot be used in 8.1

Posted: Mon Sep 01, 2014 7:24 am
by priyadarshikunal
It may not be widely used but "Allow 8 bits flag" is present in parallel transformer as well, @TRUE may not be the correct value for the same.

Parallel job developer guide states

Char Generates an ASCII character from its numeric code value. You can optionally specify the allow8bits argument to convert 8-bit ASCII values.

try using "TRUE" or 1 for that argument as I don't seem to find any example. And I can confirm the optional argument for parallel job in 8.5 - 9.1. You can check the parallel job developer guide for 8.1 if its there as well.

Here is the link for char function in 9.1.

Posted: Mon Sep 08, 2014 9:41 am
by altruist
I am not able to find similar function in 8.1

do you think I can construct something using OR_BITS