Moving all Duplicate records to second file

manish1005 · Post by **manish1005** » Wed Apr 21, 2010 3:03 am

Hi,

My requirement is to read data from a sequential file and move all records which have duplicates in primary key column to one file while moving others to a second file(order or records does not matter).

Input File(first column is key column)
abc, 123
rtr, 3434
abc, 5454
def, 34353
rtr, 343436
efe, 34343

Output file 1:
abc, 123
rtr, 3434
abc, 5454
rtr, 343436

Output file 2:
def, 34353
efe, 34343
(order or records does not matter).

Please suggest.

Sainath.Srinivasan · Post by **Sainath.Srinivasan** » Wed Apr 21, 2010 3:21 am

Use the same file from two different stages - one going through aggregator to total the keys and into an hash file, which can be used as a lookup to locate duplicates on second stream.

ArndW · Post by **ArndW** » Wed Apr 21, 2010 3:29 am

Sort your input file on the key in DataStage, then use 2 transform stages and stage variables to detect duplicates and split the 2 streams.

Sort 1
sort by ascending Key

Transform 1
Stage Variables
svDuplicate : IF In.Key=svLastKey THEN svDuplicate+1 ELSE 0
svLastKey : In.Key
Add column DuplicateNumber derived from svDuplicate to output.

Sort 2
sort by ascending Key
and descending DuplicateNumber

Transform 2
Stage Variables
svUnique : IF In.Key=svLastKey THEN 0 ELSE 1
svLastKey : In.Key
Constraints for Duplicates link : In.DuplicateNumber > 1 OR (In.DuplicateNumber = 1 AND svUnique=0)
Constraints for Singles link : In.DuplicateNumber = 1 AND svUnique=1

manish1005 · Post by **manish1005** » Wed Apr 21, 2010 11:54 pm

Thanks Sainath& ArndW.
I will go try out sort->transform->sort->transform today.

I was wondering if there is any better way available to solve it in parallel edition(my client is planning on purchasing parallel edition)?

chulett · Post by **chulett** » Thu Apr 22, 2010 6:52 am

Different, don't know about 'better'. Or you could do the exact same thing there.

Sainath.Srinivasan · Post by **Sainath.Srinivasan** » Thu Apr 22, 2010 7:05 am

In PX, you can use uniq command to extract duplicate (or others) and fork accordingly.