Removal of Junk Characters using parallel transformer

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Kbhujad
Participant
Posts: 10
Joined: Mon Jul 21, 2008 7:31 am
Location: Pune
Contact:

Removal of Junk Characters using parallel transformer

Post by Kbhujad »

We have created a Datastage parallel job to remove junk characters from a sequential file and load data into Netezza table.

We are able to remove the Junk Characters using Basic transformer using Convert and Oconv function.

But we think that the Basic transformer might create a perfomance issue.

So can anyone suggest an alternative solution to remove junk character using parallel transformer or anything else.
keshav0307
Premium Member
Premium Member
Posts: 783
Joined: Mon Jan 16, 2006 10:17 pm
Location: Sydney, Australia

Post by keshav0307 »

ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Make very sure that they're junk! Don't assume anything.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Kbhujad
Participant
Posts: 10
Joined: Mon Jul 21, 2008 7:31 am
Location: Pune
Contact:

Post by Kbhujad »

We have confirmed that these characters are control characters.
miwinter
Participant
Posts: 396
Joined: Thu Jun 22, 2006 7:00 am
Location: England, UK

Post by miwinter »

When you say 'control characters' - what do you mean exactly? (newline characters? tabs?) Examples please.
Mark Winter
<i>Nothing appeases a troubled mind more than <b>good</b> music</i>
Kbhujad
Participant
Posts: 10
Joined: Mon Jul 21, 2008 7:31 am
Location: Pune
Contact:

Post by Kbhujad »

Control characters like ^A,^Z...etc should get removed.

The control characters having Ascii values like "\000\001\002\003\004\005\006\007\010\011\013\014\015\016\017\020\021\022\023\024\025\026\027\030\031\032\033\034\035\036\037\177\377"
should get reomoved from the input field.
Post Reply