Eliminate duplicates from file and capture in flatfile

Jagan617 · Post by **Jagan617** » Fri Jul 24, 2009 12:29 pm

Can anyone assist me in capturing all the duplicate records coming from csv file into a separate file without using aggregator.

ddevdutt · Post by **ddevdutt** » Fri Jul 24, 2009 12:36 pm

You should be able to achieve this using stage variables in the transformer

ArndW · Post by **ArndW** » Fri Jul 24, 2009 12:45 pm

Stage variables will work but you would also need to sort the data.

Sainath.Srinivasan · Post by **Sainath.Srinivasan** » Fri Jul 24, 2009 12:47 pm

Please define duplicates.

Are you looking for identical rows or duplicates by specific columns ?

ssbhas · Post by **ssbhas** » Fri Jul 24, 2009 1:28 pm

I think the best way to capture duplicates is by using "Sort Stage".

Define the columns for which you want to find duplicates on as your sort keys and enable "Create Key Change Column". This will create additional column "keyChange". All rows with the value of '0' (zero) in "keyChange" are duplicates.

P.S.: Make sure you hash partition on key columns.

ddevdutt · Post by **ddevdutt** » Fri Jul 24, 2009 2:25 pm

DataStage Server Edition is being used :D

ssbhas wrote: P.S.: Make sure you hash partition on key columns.

Jagan617 · Post by **Jagan617** » Fri Jul 24, 2009 7:41 pm

duplicates by specific columns ?

Jagan617 · Post by **Jagan617** » Fri Jul 24, 2009 7:46 pm

ArndW wrote:Stage variables will work but you would also need to sort the data. ...

can you please tell what is the approach in transformer using stage variable after data being sorted.

ArndW · Post by **ArndW** » Sat Jul 25, 2009 12:50 am

You would need to answer Srini's question in order to get a good answer. Basically, stage variables are used to store values from the previous row and compare them to the current row. You would compare those columns you wish to detect duplicates on and, using constraints, skip rows with duplicates. Again, you would need to sort the data so that duplicates can be detected.

DSXchange