Page 1 of 1

Eliminate Duplicate data

Posted: Thu Feb 17, 2011 12:14 am
by pranaychaturvedi3
How do we eliminated duplicate data using stage variables in transformer in datastage?

Posted: Thu Feb 17, 2011 1:26 am
by devesh_ssingh
why not use RD rather than x'mer?
Is it interview question :wink: ??

Posted: Thu Feb 17, 2011 2:52 am
by pranaychaturvedi3
devesh_ssingh wrote:why not use RD rather than x'mer?
Is it interview question :wink: ??

Actuallly,the duplicate data has to be sent to a separate file.

Posted: Thu Feb 17, 2011 3:00 am
by Vidyut
Bro search dsxchange....this ques has been answered atleast 10 times

Thanks

Posted: Thu Feb 17, 2011 3:05 am
by devesh_ssingh
you never said dupicate to be captured...


there are many ways but one which i have tried ans tested

sort the data using sort stage on key column which decide your duplicate..
then aggregate on same key.
so you will have
column value and count...

now inner join input file with one the output from aggregator...
then put x'mer giving two o/p file
constaint is count>2 should give only unique otherwise duplicate...
use partion method carefully....

in sot hash partition in same order as sorting on key column
aggrator should be with same partiton
but in join use hash on both the links....

Posted: Thu Feb 17, 2011 7:14 am
by stuartjvnorton
I know this doesn't answer your interview question (I think you all must have applied to the same place...), but here goes.

Sort with "create key change column" enabled, then Filter where KCC = 1 means good data and KCC = 0 means dupes.