how to capture the duplicate records using remove duplicate

nandela.sudheer · Post by **nandela.sudheer** » Sun Nov 02, 2008 9:26 am

i am not able to load all the records into target due to duplicates.

ray.wurlod · Post by **ray.wurlod** » Sun Nov 02, 2008 10:25 am

Remove the duplicates or use a target table that will accept duplicates or generate an artificial and unique key.

Nagaraj · Post by **Nagaraj** » Sun Nov 02, 2008 3:10 pm

I dont see any options available in the RDS stage to handle the duplicates,
bcos i believe the duplicates would be dropped.
But btw what is the need to capture duplicate records?
i have seen buisness requirement to catch the rejected records......!

ray.wurlod · Post by **ray.wurlod** » Sun Nov 02, 2008 7:54 pm

The usual approach to this is either to use a Sort stage within which a Key Change column is generated; immediately the duplicates are identified. Another approach uses a fork-join design in which an Aggregator determines the count of records for each key. the count is joined back to each row having that key, and anything having a count > 1 can be captured into the duplicates stream as well as all rows being captured into the all-keys stream.

sandeepgs · Post by **sandeepgs** » Sun Nov 02, 2008 11:12 pm

nandela.sudheer wrote:i am not able to load all the records into target due to duplicates.

Hi,

Can you clearly explain the requirement..

What I unsrestood is for a certaing combination of key columns you have to retain only one record. If this is the requirement you can achieve this by using remove duplicate stage.

In the RDS set the key columns for removing duplicates. Sort the data on the key columns you are setting for removing duplicates.

Proper sorting of data is required if you are using a remove duplistage.

Nagaraj · Post by **Nagaraj** » Mon Nov 03, 2008 8:57 am

sandeep i think he wants to capture the duplicate records in another stream in the same job or something like that

DSXchange

how to capture the duplicate records using remove duplicate

how to capture the duplicate records using remove duplicate

Re: how to capture the duplicate records using remove duplic