Capturing duplicates

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
adityavarma
Premium Member
Premium Member
Posts: 104
Joined: Thu Jul 12, 2007 11:32 pm
Location: Canada

Capturing duplicates

Post by adityavarma »

Hi,
I have an requirement where in if my source file is having duplicates, i need to capture all the duplicates and load into another table.

for ex:
101,test,austraila
101,test,austraila
202,test1,india

I need to capture both first two records and load into a table.

I have done it through the sort and filter stage, but in this one record is going to one link and other to another link.

Can anyone please suggest on how to proceed on this.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Depending upon what you want to do there are several stages and methods available to you.

It is not quite clear from your description what you want to achieve. In your example, the "101" row is duplicated. Do you want both records to go down one link, or the first to go down one link and subsequent duplicates to go another path?
adityavarma
Premium Member
Premium Member
Posts: 104
Joined: Thu Jul 12, 2007 11:32 pm
Location: Canada

Post by adityavarma »

AndrW,

I want both the records(101) to go into one link.
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

Assuming you are looking to extract records with duplicate ID values, you can obtain it by forking your data stream - one into aggregator followed by referenced back into the main thread to get the IDs with count > 1.

If the volume is low, I suggest running uniq command to obtain this count values.
abhilashnair
Participant
Posts: 284
Joined: Fri Oct 13, 2006 4:31 am

Post by abhilashnair »

You can use an aggrgator stage here...Take a count of the key column..if count greater than 1 have it routed to a output via filter
ds_dwh
Participant
Posts: 39
Joined: Fri May 14, 2010 6:06 am

Re: Capturing duplicates

Post by ds_dwh »

Hi,

take sort stage set properties like allow duplicates=T,
creatate key change column=T,

in filter u can write conditions like
whre creatate keychange cloumn=0---->to capture duplicate records

where creatate keychange cloumn=1---->to capture unique records
ANJI
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

You missed the "capture all duplicates" part of the requirement, hence the other suggestions for a fork join design.
-craig

"You can never have too many knives" -- Logan Nine Fingers
adityavarma
Premium Member
Premium Member
Posts: 104
Joined: Thu Jul 12, 2007 11:32 pm
Location: Canada

Post by adityavarma »

thank you all for your responses

Requirement is that I want to capture all the duplicates into one link.

ex:
10001 sainath
10001 andrw
10002 aditya
10003 dsdwh

I want to send both the records of 10001(sainath and andrw) to be sent to one link and records 10002 ,10003 to another link.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

We know... and you've been given the solution for that.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply