Page 1 of 1

how to capture the duplicate records using remove duplicate

Posted: Sun Nov 02, 2008 9:26 am
by nandela.sudheer
i am not able to load all the records into target due to duplicates.

Posted: Sun Nov 02, 2008 10:25 am
by ray.wurlod
Remove the duplicates or use a target table that will accept duplicates or generate an artificial and unique key.

Posted: Sun Nov 02, 2008 3:10 pm
by Nagaraj
I dont see any options available in the RDS stage to handle the duplicates,
bcos i believe the duplicates would be dropped.
But btw what is the need to capture duplicate records?
i have seen buisness requirement to catch the rejected records......!

Posted: Sun Nov 02, 2008 7:54 pm
by ray.wurlod
The usual approach to this is either to use a Sort stage within which a Key Change column is generated; immediately the duplicates are identified. Another approach uses a fork-join design in which an Aggregator determines the count of records for each key. the count is joined back to each row having that key, and anything having a count > 1 can be captured into the duplicates stream as well as all rows being captured into the all-keys stream.

Re: how to capture the duplicate records using remove duplic

Posted: Sun Nov 02, 2008 11:12 pm
by sandeepgs
nandela.sudheer wrote:i am not able to load all the records into target due to duplicates.

Hi,

Can you clearly explain the requirement..

What I unsrestood is for a certaing combination of key columns you have to retain only one record. If this is the requirement you can achieve this by using remove duplicate stage.

In the RDS set the key columns for removing duplicates. Sort the data on the key columns you are setting for removing duplicates.

Proper sorting of data is required if you are using a remove duplistage.

Posted: Mon Nov 03, 2008 8:57 am
by Nagaraj
sandeep i think he wants to capture the duplicate records in another stream in the same job or something like that