Eliminate duplicates from file and capture in flatfile

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply
Jagan617
Participant
Posts: 42
Joined: Thu Jun 05, 2008 7:37 pm

Eliminate duplicates from file and capture in flatfile

Post by Jagan617 »

Can anyone assist me in capturing all the duplicate records coming from csv file into a separate file without using aggregator.
ddevdutt
Participant
Posts: 47
Joined: Wed Aug 22, 2007 2:38 pm

Re: Eliminate duplicates from file and capture in flatfile

Post by ddevdutt »

You should be able to achieve this using stage variables in the transformer
DD

Success is right around the corner
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Stage variables will work but you would also need to sort the data.
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

Please define duplicates.

Are you looking for identical rows or duplicates by specific columns ?
ssbhas
Premium Member
Premium Member
Posts: 8
Joined: Thu Jul 21, 2005 11:02 pm

Use sort stage and filter

Post by ssbhas »

I think the best way to capture duplicates is by using "Sort Stage".

Define the columns for which you want to find duplicates on as your sort keys and enable "Create Key Change Column". This will create additional column "keyChange". All rows with the value of '0' (zero) in "keyChange" are duplicates.

P.S.: Make sure you hash partition on key columns.
ddevdutt
Participant
Posts: 47
Joined: Wed Aug 22, 2007 2:38 pm

Re: Use sort stage and filter

Post by ddevdutt »

DataStage Server Edition is being used :D
ssbhas wrote: P.S.: Make sure you hash partition on key columns.
DD

Success is right around the corner
Jagan617
Participant
Posts: 42
Joined: Thu Jun 05, 2008 7:37 pm

Eliminate duplicates from file and capture in flatfile

Post by Jagan617 »

duplicates by specific columns ?
Jagan617
Participant
Posts: 42
Joined: Thu Jun 05, 2008 7:37 pm

Eliminate duplicates from file and capture in flatfile

Post by Jagan617 »

ArndW wrote:Stage variables will work but you would also need to sort the data. ...



can you please tell what is the approach in transformer using stage variable after data being sorted.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

You would need to answer Srini's question in order to get a good answer. Basically, stage variables are used to store values from the previous row and compare them to the current row. You would compare those columns you wish to detect duplicates on and, using constraints, skip rows with duplicates. Again, you would need to sort the data so that duplicates can be detected.
Post Reply