Page 1 of 1

how to remove all duplicate records

Posted: Mon May 18, 2009 3:13 am
by shalini11
Hi All,

If the scenario is to remove all the records having duplicate key,the remove duplicate stage or the sort stage removes all except one.But i want to remove all records. Can it be done in Datastage?

Ex:
Col1 Col2 Col3
1 001 002
1 001 003
2 001 004
3 002 003

In this example, Col1 and Col2 are the key for removing duplicates.
I want to remove both the records having duplicate values for these columns.

Can it be implemented in DS or I need to write shell script for the same?


Thanks in adv

Re: how to remove all duplicate records

Posted: Mon May 18, 2009 3:25 am
by Pagadrai
shalini11 wrote: In this example, Col1 and Col2 are the key for removing duplicates.
I want to remove both the records having duplicate values for these columns.
Can it be implemented in DS or I need to write shell script for the same?
Hi,
This can be implemented in DataStage.
First you need to get the count by grouping the records based on Col1 and Col2. You can do this with aggregator. Obtain the values where Count > 1

Then use the above result as lookup.
If lookup matches, then reject those records from incoming source.

* You can implement this logic in single job or in 2 separate jobs.

Posted: Mon May 18, 2009 3:27 am
by shalini11
Thanks for the quick reply

I'll try this and get back to u if i face any difficulty in implementing

Posted: Mon May 18, 2009 4:11 am
by shalini11
Thanks one again for your help

I am able to remove all the duplicate records now with your suggestion

Posted: Mon May 18, 2009 4:04 pm
by ray.wurlod
shalini11 wrote:Thanks for the quick reply

I'll try this and get back to u if i face any difficulty in implementing
Why do you believe that U specifically will be able to assist?

The second person personal pronoun in English is spelled "you", not "u". Please use a professional standard of written English on DSXchange. It makes life easier for those whose first language is not English.