Page 1 of 1

Datastage processing to Drop both duplicates.

Posted: Mon Aug 26, 2013 1:13 pm
by Chandrathdsx
I am looking for simplest way to drop all the duplicates from a sequential file.

Input sequential file:
sno,sname
1,A
2,B
3,C
1,D
5,X
2,E


Desired output
sno,sname
3,C
5,X

The requirement is to skip processing of Sno = 1 , 2 records as the key sno has duplicates in the input.

Appreciate all the help with this!
Thank you!

Posted: Mon Aug 26, 2013 4:52 pm
by ray.wurlod
Use a fork-join design to get the row counts, then filter those that have a count > 1 streaming that filtered set into a stage that performs the DELETE (for example an ODBC stage using the text file driver).