Identify the duplicate records

nagarjuna900 · Post by **nagarjuna900** » Sun Apr 18, 2010 8:41 am

Hi,

In source the data as follows ( having two columns).

Col1 Col2
1 AAA
2 BBB
1 AAA
3 CCC
1 AAA

The required output as follows.
Col1 Col2 Flag
1 AAA -
1 AAA D ( Duplicate Record)
1 AAA D ( Duplicate Record)
2 BBB -
3 CCC -

Thanks a lot,

ray.wurlod · Post by **ray.wurlod** » Sun Apr 18, 2010 1:00 pm

Use a fork-join design with one side of the fork using an Aggregator to count the records with a given grouping key. After the Join, add the "(Duplicate record)" text to any record that has a count > 1.

Or, if you have a source that is accessible via SQL, you could use a HAVING clause and a UNION ALL with the duplicates and non-duplicates.