Hi,
I want to capture the duplicate rows in my input file into separte file.could any one help me on this.
Thanks
Siva
how to capture duplicate rows separetly in datastage
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Fork your data stream into two using a Copy stage. Run one stream through an Aggregator that counts the number of rows for each combination of the duplicate-defining keys. Join the two streams (you now have detail row plus count). Using another Copy stage split the stream into two again. Run one stream through a Remove Duplicates stage, and run the other stream through a Filter stage (or Transformer stage) that blocks rows with a count of 1.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Premium Member
- Posts: 783
- Joined: Mon Jan 16, 2006 10:17 pm
- Location: Sydney, Australia
Hi Sai,
I had a same requirement few days back and done in the following way.
I created two jobs for that.
1. Pass the data to Aggregator and count the number of records for the Key Combination. Collect the Key and Count into a intermediate file.
2. Read the Intermediate file and filter the records which are having Count<=1, then join with the master input file based on the keys.
Then your final o/p will be having the dulicate records.
This jobs are running in my production. If you get any better idea/thought let me know.
I had a same requirement few days back and done in the following way.
I created two jobs for that.
1. Pass the data to Aggregator and count the number of records for the Key Combination. Collect the Key and Count into a intermediate file.
2. Read the Intermediate file and filter the records which are having Count<=1, then join with the master input file based on the keys.
Then your final o/p will be having the dulicate records.
This jobs are running in my production. If you get any better idea/thought let me know.
Thanks,
Raamc
Raamc
Hi Sai,
I had a same requirement few days back and done in the following way.
I created two jobs for that.
1. Pass the data to Aggregator and count the number of records for the Key Combination. Collect the Key and Count into a intermediate file.
2. Read the Intermediate file and filter the records which are having Count<=1, then join with the master input file based on the keys.
Then your final o/p will be having the dulicate records.
This jobs are running in my production. If you get any better idea/thought let me know.
I had a same requirement few days back and done in the following way.
I created two jobs for that.
1. Pass the data to Aggregator and count the number of records for the Key Combination. Collect the Key and Count into a intermediate file.
2. Read the Intermediate file and filter the records which are having Count<=1, then join with the master input file based on the keys.
Then your final o/p will be having the dulicate records.
This jobs are running in my production. If you get any better idea/thought let me know.
Thanks,
Raamc
Raamc