Page 1 of 1

Moving all Duplicate records to second file

Posted: Wed Apr 21, 2010 3:03 am
by manish1005
Hi,

My requirement is to read data from a sequential file and move all records which have duplicates in primary key column to one file while moving others to a second file(order or records does not matter).

Input File(first column is key column)
abc, 123
rtr, 3434
abc, 5454
def, 34353
rtr, 343436
efe, 34343


Output file 1:
abc, 123
rtr, 3434
abc, 5454
rtr, 343436

Output file 2:
def, 34353
efe, 34343
(order or records does not matter).

Please suggest.

Posted: Wed Apr 21, 2010 3:21 am
by Sainath.Srinivasan
Use the same file from two different stages - one going through aggregator to total the keys and into an hash file, which can be used as a lookup to locate duplicates on second stream.

Posted: Wed Apr 21, 2010 3:29 am
by ArndW
Sort your input file on the key in DataStage, then use 2 transform stages and stage variables to detect duplicates and split the 2 streams.

Sort 1
sort by ascending Key

Transform 1
Stage Variables
svDuplicate : IF In.Key=svLastKey THEN svDuplicate+1 ELSE 0
svLastKey : In.Key
Add column DuplicateNumber derived from svDuplicate to output.

Sort 2
sort by ascending Key
and descending DuplicateNumber

Transform 2
Stage Variables
svUnique : IF In.Key=svLastKey THEN 0 ELSE 1
svLastKey : In.Key
Constraints for Duplicates link : In.DuplicateNumber > 1 OR (In.DuplicateNumber = 1 AND svUnique=0)
Constraints for Singles link : In.DuplicateNumber = 1 AND svUnique=1

Posted: Wed Apr 21, 2010 11:54 pm
by manish1005
Thanks Sainath& ArndW.
I will go try out sort->transform->sort->transform today.


I was wondering if there is any better way available to solve it in parallel edition(my client is planning on purchasing parallel edition)?

Posted: Thu Apr 22, 2010 6:52 am
by chulett
Different, don't know about 'better'. Or you could do the exact same thing there.

Posted: Thu Apr 22, 2010 7:05 am
by Sainath.Srinivasan
In PX, you can use uniq command to extract duplicate (or others) and fork accordingly.