To get only unique records

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
SagarMelam
Participant
Posts: 26
Joined: Mon Apr 21, 2008 4:03 am
Location: Amalapuram

To get only unique records

Post by SagarMelam »

Hi ,
Our requirment is that suppose we are getting values 1,1,1,2,3,4,5 for the primary key then we should only populate 2,3,4,5 which are unique.how can we impement this scenario in datastage.

Regards,
Sagar
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

What is the source ? Why do you need to do this ?

Maybe it is simple in source.
SagarMelam
Participant
Posts: 26
Joined: Mon Apr 21, 2008 4:03 am
Location: Amalapuram

Post by SagarMelam »

The source is an XML file and that is the client's requirment
Sagar
gssr
Participant
Posts: 243
Joined: Fri Jan 09, 2009 12:51 am
Location: India

Post by gssr »

Hope this work,
Aggregate by the key with the Count option and then filter it using the count value which is greater than 1
RAJ
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

Split the data into two links with one link doing an aggregation to get record count (for record count > 1) and used in the other link to locate duplicates.
varaprasad
Premium Member
Premium Member
Posts: 34
Joined: Fri May 16, 2008 6:24 am

Post by varaprasad »

Looks like an interviewer's requirement.

1. Capture the duplicates into a file
2. Do a lookup on this file to remove all the records having duplicates.

You may have to split this into two jobs.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

This is a fairly typical "fork join" design with one tine of the fork calculating the count.

When is the interview?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vinnz
Participant
Posts: 92
Joined: Tue Feb 17, 2004 9:23 pm

Post by vinnz »

Another possibility may be to (1) Sort by primary key creating a key change column (2) Deduplicate retaining the last record and then (3) filter using keychangecol=1 or keychangecol<>0

Probably inefficient compared to the fork join design though ..
Post Reply