Eliminate Duplicate data

pranaychaturvedi3 · Post by **pranaychaturvedi3** » Thu Feb 17, 2011 12:14 am

How do we eliminated duplicate data using stage variables in transformer in datastage?

devesh_ssingh · Post by **devesh_ssingh** » Thu Feb 17, 2011 1:26 am

why not use RD rather than x'mer?
Is it interview question

??

pranaychaturvedi3 · Post by **pranaychaturvedi3** » Thu Feb 17, 2011 2:52 am

devesh_ssingh wrote:why not use RD rather than x'mer?
Is it interview question ??

Actuallly,the duplicate data has to be sent to a separate file.

Vidyut · Post by **Vidyut** » Thu Feb 17, 2011 3:00 am

Bro search dsxchange....this ques has been answered atleast 10 times

Thanks

devesh_ssingh · Post by **devesh_ssingh** » Thu Feb 17, 2011 3:05 am

you never said dupicate to be captured...

there are many ways but one which i have tried ans tested

sort the data using sort stage on key column which decide your duplicate..
then aggregate on same key.
so you will have
column value and count...

now inner join input file with one the output from aggregator...
then put x'mer giving two o/p file
constaint is count>2 should give only unique otherwise duplicate...
use partion method carefully....

in sot hash partition in same order as sorting on key column
aggrator should be with same partiton
but in join use hash on both the links....

stuartjvnorton · Post by **stuartjvnorton** » Thu Feb 17, 2011 7:14 am

I know this doesn't answer your interview question (I think you all must have applied to the same place...), but here goes.

Sort with "create key change column" enabled, then Filter where KCC = 1 means good data and KCC = 0 means dupes.