Remove Duplicates

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
kumar66
Participant
Posts: 265
Joined: Thu Jul 26, 2007 12:14 am

Remove Duplicates

Post by kumar66 »

Hi All,

I have to remove the duplicates and also need to capture the duplicate records.

I thought of doing with remove duplicate stage. But it didnt worked out.
Please advise how to do it.


Thanks & Regards,
Kumar66
BugFree
Participant
Posts: 82
Joined: Wed Dec 13, 2006 6:02 am

Post by BugFree »

Do a search on "remove and capture the duplicate records" (Search for all terms). This has been answered a lot of times here.
Ping me if I am wrong...
mcs_suman
Participant
Posts: 20
Joined: Thu Sep 27, 2007 8:42 am
Location: chennai
Contact:

REMOVE DUPLICATES

Post by mcs_suman »

use sort stage with the option key change column which gives the output value as '0' for non-duplicates and '1' for duplicates.And then use filter stage to filter these records separately based on the key value.




source------>sort--->filter-------->target



BugFree wrote:Do a search on "remove and capture the duplicate records" (Search for all terms). This has been answered a lot of times here.
:roll: :roll:
suman
kumar66
Participant
Posts: 265
Joined: Thu Jul 26, 2007 12:14 am

Post by kumar66 »

Hi mcs_suman,

Thanks for your response.

I tried in a diffrent way .

source----->copy stage---->Lookup-----> Transformer-----> Target
|
v
Aggrator


I made a lookup on the row count. If the row count is more than 1 , I will reject the row.

Thanks & regards,
Kumar66
Post Reply