Hi All
I have scenario where i need to capture the duplicate records to a file.
Ex;
column1 column2
1 james
2 james
also
ex:
column1 column2
1 mary
1 charle
if have both scenario i need to reject both the records in both scenarios how i can implement this logic through datastage.
Capturing duplicates
Moderators: chulett, rschirm, roy
-
- Premium Member
- Posts: 196
- Joined: Tue Nov 23, 2004 11:50 pm
- Location: Sydney (Australia)
sort on key columns on which you want to find duplicates using a sort stage and set create key change column as 'yes'. Then use a filter or transformer stage by using following condition:
1. key change column =1, put this constraint in master output link (unique records)
2. key change column =0, put this constraint to collect duplicates.
1. key change column =1, put this constraint in master output link (unique records)
2. key change column =0, put this constraint to collect duplicates.
-Nripendra Chand
-
- Participant
- Posts: 14
- Joined: Thu Jul 10, 2008 7:37 am
- Location: mumbai
-
- Premium Member
- Posts: 730
- Joined: Tue Nov 04, 2008 10:14 am
- Location: Bangalore
I guess the poster need both records to be rejected hence placing key change column =1 in the constraints wont work.....Nripendra Chand wrote:sort on key columns on which you want to find duplicates using a sort stage and set create key change column as 'yes'. Then use a filter or transformer stage by using following condition:
1. key change column =1, put this constraint in master output link (unique records)
2. key change column =0, put this constraint to collect duplicates.
you need to do a fork join using keys with key change column =0 to achieve the objective
- Zulfi
So, in reality you are needing to identify and remove all duplicates on column 1, as well as identify and remove all duplicates on column 2.
Which 8.x version of IS are you using? The approach can be different with 8.5 than with 8.0 and 8.1.
Approach this one column at a time.
For 8.0/8.1: Partition/Sort on the first column and generate a keyChange column. Use a transformer to capture the values which have duplicate entries and output those values to a separate stream with a flag value, then join back to the main stream to flag and then delete all rows with those values.
Repeat this process for the second column.
For 8.5: The same can be accomplished with a sort followed by a transformer using the new looping and key-change recognition features. The first loop iteration can simply count rows for the key and set a flag based on the result, then the second iteration can keep or drop the records as needed.
Regards,
Which 8.x version of IS are you using? The approach can be different with 8.5 than with 8.0 and 8.1.
Approach this one column at a time.
For 8.0/8.1: Partition/Sort on the first column and generate a keyChange column. Use a transformer to capture the values which have duplicate entries and output those values to a separate stream with a flag value, then join back to the main stream to flag and then delete all rows with those values.
Repeat this process for the second column.
For 8.5: The same can be accomplished with a sort followed by a transformer using the new looping and key-change recognition features. The first loop iteration can simply count rows for the key and set a flag based on the result, then the second iteration can keep or drop the records as needed.
Regards,
- james wiles
All generalizations are false, including this one - Mark Twain.
All generalizations are false, including this one - Mark Twain.