Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.
Moderators: chulett , rschirm , roy
Kbhujad
Participant
Posts: 10 Joined: Mon Jul 21, 2008 7:31 am
Location: Pune
Contact:
Post
by Kbhujad » Tue Jul 22, 2008 12:39 am
We have created a Datastage parallel job to remove junk characters from a sequential file and load data into Netezza table.
We are able to remove the Junk Characters using Basic transformer using Convert and Oconv function.
But we think that the Basic transformer might create a perfomance issue.
So can anyone suggest an alternative solution to remove junk character using parallel transformer or anything else.
keshav0307
Premium Member
Posts: 783 Joined: Mon Jan 16, 2006 10:17 pm
Location: Sydney, Australia
Post
by keshav0307 » Tue Jul 22, 2008 12:49 am
ray.wurlod
Participant
Posts: 54607 Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:
Post
by ray.wurlod » Tue Jul 22, 2008 1:03 am
Make very sure that they're junk! Don't assume anything.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Kbhujad
Participant
Posts: 10 Joined: Mon Jul 21, 2008 7:31 am
Location: Pune
Contact:
Post
by Kbhujad » Tue Jul 22, 2008 2:31 am
We have confirmed that these characters are control characters.
miwinter
Participant
Posts: 396 Joined: Thu Jun 22, 2006 7:00 am
Location: England, UK
Post
by miwinter » Tue Jul 22, 2008 8:32 am
When you say 'control characters' - what do you mean exactly? (newline characters? tabs?) Examples please.
Mark Winter
<i>Nothing appeases a troubled mind more than <b>good</b> music</i>
Kbhujad
Participant
Posts: 10 Joined: Mon Jul 21, 2008 7:31 am
Location: Pune
Contact:
Post
by Kbhujad » Tue Jul 22, 2008 11:44 pm
Control characters like ^A,^Z...etc should get removed.
The control characters having Ascii values like "\000\001\002\003\004\005\006\007\010\011\013\014\015\016\017\020\021\022\023\024\025\026\027\030\031\032\033\034\035\036\037\177\377"
should get reomoved from the input field.