Identify the duplicate records

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply
nagarjuna900
Participant
Posts: 35
Joined: Mon Dec 29, 2008 2:22 am
Location: chennai

Identify the duplicate records

Post by nagarjuna900 »

Hi,

In source the data as follows ( having two columns).

Col1 Col2
1 AAA
2 BBB
1 AAA
3 CCC
1 AAA

The required output as follows.
Col1 Col2 Flag
1 AAA -
1 AAA D ( Duplicate Record)
1 AAA D ( Duplicate Record)
2 BBB -
3 CCC -



Thanks a lot,
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Use a fork-join design with one side of the fork using an Aggregator to count the records with a given grouping key. After the Join, add the "(Duplicate record)" text to any record that has a count > 1.

Or, if you have a source that is accessible via SQL, you could use a HAVING clause and a UNION ALL with the duplicates and non-duplicates.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply