Datastage processing to Drop both duplicates.

Chandrathdsx · Post by **Chandrathdsx** » Mon Aug 26, 2013 1:13 pm

I am looking for simplest way to drop all the duplicates from a sequential file.

Input sequential file:
sno,sname
1,A
2,B
3,C
1,D
5,X
2,E

Desired output
sno,sname
3,C
5,X

The requirement is to skip processing of Sno = 1 , 2 records as the key sno has duplicates in the input.

Appreciate all the help with this!
Thank you!

ray.wurlod · Post by **ray.wurlod** » Mon Aug 26, 2013 4:52 pm

Use a fork-join design to get the row counts, then filter those that have a count > 1 streaming that filtered set into a stage that performs the DELETE (for example an ODBC stage using the text file driver).