how to capture the duplicate records using remove duplicate
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 39
- Joined: Mon May 19, 2008 7:22 am
how to capture the duplicate records using remove duplicate
i am not able to load all the records into target due to duplicates.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
The usual approach to this is either to use a Sort stage within which a Key Change column is generated; immediately the duplicates are identified. Another approach uses a fork-join design in which an Aggregator determines the count of records for each key. the count is joined back to each row having that key, and anything having a count > 1 can be captured into the duplicates stream as well as all rows being captured into the all-keys stream.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Re: how to capture the duplicate records using remove duplic
nandela.sudheer wrote:i am not able to load all the records into target due to duplicates.
Hi,
Can you clearly explain the requirement..
What I unsrestood is for a certaing combination of key columns you have to retain only one record. If this is the requirement you can achieve this by using remove duplicate stage.
In the RDS set the key columns for removing duplicates. Sort the data on the key columns you are setting for removing duplicates.
Proper sorting of data is required if you are using a remove duplistage.