how to capture the duplicate records using remove duplicate

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
nandela.sudheer
Participant
Posts: 39
Joined: Mon May 19, 2008 7:22 am

how to capture the duplicate records using remove duplicate

Post by nandela.sudheer »

i am not able to load all the records into target due to duplicates.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Remove the duplicates or use a target table that will accept duplicates or generate an artificial and unique key.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Nagaraj
Premium Member
Premium Member
Posts: 383
Joined: Thu Nov 08, 2007 12:32 am
Location: Bangalore

Post by Nagaraj »

I dont see any options available in the RDS stage to handle the duplicates,
bcos i believe the duplicates would be dropped.
But btw what is the need to capture duplicate records?
i have seen buisness requirement to catch the rejected records......!
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The usual approach to this is either to use a Sort stage within which a Key Change column is generated; immediately the duplicates are identified. Another approach uses a fork-join design in which an Aggregator determines the count of records for each key. the count is joined back to each row having that key, and anything having a count > 1 can be captured into the duplicates stream as well as all rows being captured into the all-keys stream.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
sandeepgs
Participant
Posts: 87
Joined: Wed Jul 02, 2008 12:22 am

Re: how to capture the duplicate records using remove duplic

Post by sandeepgs »

nandela.sudheer wrote:i am not able to load all the records into target due to duplicates.

Hi,

Can you clearly explain the requirement..

What I unsrestood is for a certaing combination of key columns you have to retain only one record. If this is the requirement you can achieve this by using remove duplicate stage.

In the RDS set the key columns for removing duplicates. Sort the data on the key columns you are setting for removing duplicates.

Proper sorting of data is required if you are using a remove duplistage.
Nagaraj
Premium Member
Premium Member
Posts: 383
Joined: Thu Nov 08, 2007 12:32 am
Location: Bangalore

Post by Nagaraj »

sandeep i think he wants to capture the duplicate records in another stream in the same job or something like that
Post Reply