Datastage processing to Drop both duplicates.

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Chandrathdsx
Participant
Posts: 59
Joined: Sat Jul 05, 2008 11:32 am

Datastage processing to Drop both duplicates.

Post by Chandrathdsx »

I am looking for simplest way to drop all the duplicates from a sequential file.

Input sequential file:
sno,sname
1,A
2,B
3,C
1,D
5,X
2,E


Desired output
sno,sname
3,C
5,X

The requirement is to skip processing of Sno = 1 , 2 records as the key sno has duplicates in the input.

Appreciate all the help with this!
Thank you!
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Use a fork-join design to get the row counts, then filter those that have a count > 1 streaming that filtered set into a stage that performs the DELETE (for example an ODBC stage using the text file driver).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply