How to remove duplicate & capture removed records in fil

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
akash_nitj
Participant
Posts: 27
Joined: Fri Aug 13, 2004 3:36 am
Location: INDIA

How to remove duplicate & capture removed records in fil

Post by akash_nitj »

Hi Techies
Is it possible in some way in datastage where we reject the duplicate records and also capture the duplicate records in some file

Remove Duplicate stage doesn't have a reject link??

Any other easy way out.....
TIA
akash
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Morning Akash,

as you have noticed, the remove duplicates stage does only that and won't allow a second reject output link. If your input data stream is sorted then I would use a transform stage and use stage variables to detect whether or not you have a duplicate row

i.e.:

Code: Select all

CurrentCompareString = {concatenated list of columns to use for comparison}
DuplicateRecord = IF LastCompareString = CurrentCompareString THEN @TRUE ELSE @FALSE
LastCompareString = CurrentCompareString
Then use constraints with the logical value of "DuplicateRecord"

Another option would be to use the CDC stage and two different SELECTs (one with UNIQUE) on the source Data...
Post Reply