capture duplicates

just4u_sharath · Post by **just4u_sharath** » Tue Jan 08, 2008 1:16 am

From dataset i am removind duplicates using the remove duplicate state. Now my requirement is to capture those duplicates which are removed and place in a sequentil file. How can i capture those removed duplicates.

Mayur Dongaonkar · Post by **Mayur Dongaonkar** » Tue Jan 08, 2008 2:17 am

Duplicates can be captured by following stage:

Dataset ----> sort ( on key columns ) ---> aggregator ( on key columns + count operation ) ---> filter ( count > 1 ) ---> sequencial file

Maveric · Post by **Maveric** » Tue Jan 08, 2008 2:39 am

Set the "Create Cluster Key Change Column" property in sort stage to true. This creates the output field "clusterKeyChange". The values in this field will be 1 for a record, and 0 for all its duplicate records. Using the filter stage you can get the duplicates in one link and unique records in one link by applying the filter condition on "clusterKeyChange" field.

Das · Post by **Das** » Mon Mar 03, 2008 6:37 am

Maveric wrote:Set the "Create Cluster Key Change Column" property in sort stage to true. This creates the output field "clusterKeyChange". The values in this field will be 1 for a record, and 0 for all its duplicate records. Using the filter stage you can get the duplicates in one link and unique records in one link by applying the filter condition on "clusterKeyChange" field.

Its OK but i have a dobt why we need to go for ClusterKeyChange ,Does it possible by KeyChange.I have used key change in meny occations .Any body can explain the senariao in which we are going for ClusterKeyChange.and Whts the difference

Thanks in advance

yousuff1710 · Post by **yousuff1710** » Thu Jan 29, 2009 3:52 am

You are right, keychange option is used when sort mode is: sort . ClusterKeyChange is used for sort mode = Dont sort (previously sorted).

keshav0307 · Post by **keshav0307** » Fri Jan 30, 2009 6:27 am

this has been discussed so many times... just try some search "capture duplicate"

DSXchange

capture duplicates

capture duplicates

Re: capture duplicates