Eliminate Duplicate data

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
pranaychaturvedi3
Participant
Posts: 2
Joined: Thu Feb 17, 2011 12:11 am

Eliminate Duplicate data

Post by pranaychaturvedi3 »

How do we eliminated duplicate data using stage variables in transformer in datastage?
devesh_ssingh
Participant
Posts: 148
Joined: Thu Apr 10, 2008 12:47 am

Post by devesh_ssingh »

why not use RD rather than x'mer?
Is it interview question :wink: ??
pranaychaturvedi3
Participant
Posts: 2
Joined: Thu Feb 17, 2011 12:11 am

Post by pranaychaturvedi3 »

devesh_ssingh wrote:why not use RD rather than x'mer?
Is it interview question :wink: ??

Actuallly,the duplicate data has to be sent to a separate file.
Vidyut
Participant
Posts: 24
Joined: Wed Oct 13, 2010 12:45 am

Post by Vidyut »

Bro search dsxchange....this ques has been answered atleast 10 times

Thanks
devesh_ssingh
Participant
Posts: 148
Joined: Thu Apr 10, 2008 12:47 am

Post by devesh_ssingh »

you never said dupicate to be captured...


there are many ways but one which i have tried ans tested

sort the data using sort stage on key column which decide your duplicate..
then aggregate on same key.
so you will have
column value and count...

now inner join input file with one the output from aggregator...
then put x'mer giving two o/p file
constaint is count>2 should give only unique otherwise duplicate...
use partion method carefully....

in sot hash partition in same order as sorting on key column
aggrator should be with same partiton
but in join use hash on both the links....
stuartjvnorton
Participant
Posts: 527
Joined: Thu Apr 19, 2007 1:25 am
Location: Melbourne

Post by stuartjvnorton »

I know this doesn't answer your interview question (I think you all must have applied to the same place...), but here goes.

Sort with "create key change column" enabled, then Filter where KCC = 1 means good data and KCC = 0 means dupes.
Post Reply