Moving all Duplicate records to second file

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
manish1005
Participant
Posts: 39
Joined: Thu Nov 23, 2006 11:23 pm

Moving all Duplicate records to second file

Post by manish1005 »

Hi,

My requirement is to read data from a sequential file and move all records which have duplicates in primary key column to one file while moving others to a second file(order or records does not matter).

Input File(first column is key column)
abc, 123
rtr, 3434
abc, 5454
def, 34353
rtr, 343436
efe, 34343


Output file 1:
abc, 123
rtr, 3434
abc, 5454
rtr, 343436

Output file 2:
def, 34353
efe, 34343
(order or records does not matter).

Please suggest.
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

Use the same file from two different stages - one going through aggregator to total the keys and into an hash file, which can be used as a lookup to locate duplicates on second stream.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Sort your input file on the key in DataStage, then use 2 transform stages and stage variables to detect duplicates and split the 2 streams.

Sort 1
sort by ascending Key

Transform 1
Stage Variables
svDuplicate : IF In.Key=svLastKey THEN svDuplicate+1 ELSE 0
svLastKey : In.Key
Add column DuplicateNumber derived from svDuplicate to output.

Sort 2
sort by ascending Key
and descending DuplicateNumber

Transform 2
Stage Variables
svUnique : IF In.Key=svLastKey THEN 0 ELSE 1
svLastKey : In.Key
Constraints for Duplicates link : In.DuplicateNumber > 1 OR (In.DuplicateNumber = 1 AND svUnique=0)
Constraints for Singles link : In.DuplicateNumber = 1 AND svUnique=1
manish1005
Participant
Posts: 39
Joined: Thu Nov 23, 2006 11:23 pm

Post by manish1005 »

Thanks Sainath& ArndW.
I will go try out sort->transform->sort->transform today.


I was wondering if there is any better way available to solve it in parallel edition(my client is planning on purchasing parallel edition)?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Different, don't know about 'better'. Or you could do the exact same thing there.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

In PX, you can use uniq command to extract duplicate (or others) and fork accordingly.
Post Reply