capture duplicates

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
just4u_sharath
Premium Member
Premium Member
Posts: 236
Joined: Sun Apr 01, 2007 7:41 am
Location: Michigan

capture duplicates

Post by just4u_sharath »

From dataset i am removind duplicates using the remove duplicate state. Now my requirement is to capture those duplicates which are removed and place in a sequentil file. How can i capture those removed duplicates.
Mayur Dongaonkar
Participant
Posts: 20
Joined: Mon Dec 11, 2006 10:57 am
Location: Pune

Re: capture duplicates

Post by Mayur Dongaonkar »

Duplicates can be captured by following stage:

Dataset ----> sort ( on key columns ) ---> aggregator ( on key columns + count operation ) ---> filter ( count > 1 ) ---> sequencial file
Mayur Dongaonkar.
Maveric
Participant
Posts: 388
Joined: Tue Mar 13, 2007 1:28 am

Post by Maveric »

Set the "Create Cluster Key Change Column" property in sort stage to true. This creates the output field "clusterKeyChange". The values in this field will be 1 for a record, and 0 for all its duplicate records. Using the filter stage you can get the duplicates in one link and unique records in one link by applying the filter condition on "clusterKeyChange" field.
Das
Participant
Posts: 87
Joined: Tue Oct 24, 2006 9:58 pm
Location: india

Post by Das »

Maveric wrote:Set the "Create Cluster Key Change Column" property in sort stage to true. This creates the output field "clusterKeyChange". The values in this field will be 1 for a record, and 0 for all its duplicate records. Using the filter stage you can get the duplicates in one link and unique records in one link by applying the filter condition on "clusterKeyChange" field.
Its OK but i have a dobt why we need to go for ClusterKeyChange ,Does it possible by KeyChange.I have used key change in meny occations .Any body can explain the senariao in which we are going for ClusterKeyChange.and Whts the difference

Thanks in advance
yousuff1710
Participant
Posts: 56
Joined: Fri Sep 21, 2007 9:10 am
Location: Bangalore

Post by yousuff1710 »

You are right, keychange option is used when sort mode is: sort . ClusterKeyChange is used for sort mode = Dont sort (previously sorted).
keshav0307
Premium Member
Premium Member
Posts: 783
Joined: Mon Jan 16, 2006 10:17 pm
Location: Sydney, Australia

Post by keshav0307 »

this has been discussed so many times... just try some search "capture duplicate"
Post Reply