Hi,
My requirement is to read data from a sequential file and move all records which have duplicates in primary key column to one file while moving others to a second file(order or records does not matter).
Input File(first column is key column)
abc, 123
rtr, 3434
abc, 5454
def, 34353
rtr, 343436
efe, 34343
Output file 1:
abc, 123
rtr, 3434
abc, 5454
rtr, 343436
Output file 2:
def, 34353
efe, 34343
(order or records does not matter).
Please suggest.
Moving all Duplicate records to second file
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 39
- Joined: Thu Nov 23, 2006 11:23 pm
-
- Participant
- Posts: 3337
- Joined: Mon Jan 17, 2005 4:49 am
- Location: United Kingdom
Sort your input file on the key in DataStage, then use 2 transform stages and stage variables to detect duplicates and split the 2 streams.
Sort 1
sort by ascending Key
Transform 1
Stage Variables
svDuplicate : IF In.Key=svLastKey THEN svDuplicate+1 ELSE 0
svLastKey : In.Key
Add column DuplicateNumber derived from svDuplicate to output.
Sort 2
sort by ascending Key
and descending DuplicateNumber
Transform 2
Stage Variables
svUnique : IF In.Key=svLastKey THEN 0 ELSE 1
svLastKey : In.Key
Constraints for Duplicates link : In.DuplicateNumber > 1 OR (In.DuplicateNumber = 1 AND svUnique=0)
Constraints for Singles link : In.DuplicateNumber = 1 AND svUnique=1
Sort 1
sort by ascending Key
Transform 1
Stage Variables
svDuplicate : IF In.Key=svLastKey THEN svDuplicate+1 ELSE 0
svLastKey : In.Key
Add column DuplicateNumber derived from svDuplicate to output.
Sort 2
sort by ascending Key
and descending DuplicateNumber
Transform 2
Stage Variables
svUnique : IF In.Key=svLastKey THEN 0 ELSE 1
svLastKey : In.Key
Constraints for Duplicates link : In.DuplicateNumber > 1 OR (In.DuplicateNumber = 1 AND svUnique=0)
Constraints for Singles link : In.DuplicateNumber = 1 AND svUnique=1
-
- Participant
- Posts: 39
- Joined: Thu Nov 23, 2006 11:23 pm
-
- Participant
- Posts: 3337
- Joined: Mon Jan 17, 2005 4:49 am
- Location: United Kingdom