capture duplicates
Moderators: chulett, rschirm, roy
-
- Premium Member
- Posts: 236
- Joined: Sun Apr 01, 2007 7:41 am
- Location: Michigan
capture duplicates
From dataset i am removind duplicates using the remove duplicate state. Now my requirement is to capture those duplicates which are removed and place in a sequentil file. How can i capture those removed duplicates.
-
- Participant
- Posts: 20
- Joined: Mon Dec 11, 2006 10:57 am
- Location: Pune
Re: capture duplicates
Duplicates can be captured by following stage:
Dataset ----> sort ( on key columns ) ---> aggregator ( on key columns + count operation ) ---> filter ( count > 1 ) ---> sequencial file
Dataset ----> sort ( on key columns ) ---> aggregator ( on key columns + count operation ) ---> filter ( count > 1 ) ---> sequencial file
Mayur Dongaonkar.
Set the "Create Cluster Key Change Column" property in sort stage to true. This creates the output field "clusterKeyChange". The values in this field will be 1 for a record, and 0 for all its duplicate records. Using the filter stage you can get the duplicates in one link and unique records in one link by applying the filter condition on "clusterKeyChange" field.
Its OK but i have a dobt why we need to go for ClusterKeyChange ,Does it possible by KeyChange.I have used key change in meny occations .Any body can explain the senariao in which we are going for ClusterKeyChange.and Whts the differenceMaveric wrote:Set the "Create Cluster Key Change Column" property in sort stage to true. This creates the output field "clusterKeyChange". The values in this field will be 1 for a record, and 0 for all its duplicate records. Using the filter stage you can get the duplicates in one link and unique records in one link by applying the filter condition on "clusterKeyChange" field.
Thanks in advance
-
- Participant
- Posts: 56
- Joined: Fri Sep 21, 2007 9:10 am
- Location: Bangalore
-
- Premium Member
- Posts: 783
- Joined: Mon Jan 16, 2006 10:17 pm
- Location: Sydney, Australia