Page 1 of 1

To get only unique records

Posted: Thu Dec 03, 2009 5:36 am
by SagarMelam
Hi ,
Our requirment is that suppose we are getting values 1,1,1,2,3,4,5 for the primary key then we should only populate 2,3,4,5 which are unique.how can we impement this scenario in datastage.

Regards,
Sagar

Posted: Thu Dec 03, 2009 5:39 am
by Sainath.Srinivasan
What is the source ? Why do you need to do this ?

Maybe it is simple in source.

Posted: Thu Dec 03, 2009 5:49 am
by SagarMelam
The source is an XML file and that is the client's requirment

Posted: Thu Dec 03, 2009 6:04 am
by gssr
Hope this work,
Aggregate by the key with the Count option and then filter it using the count value which is greater than 1

Posted: Thu Dec 03, 2009 6:05 am
by Sainath.Srinivasan
Split the data into two links with one link doing an aggregation to get record count (for record count > 1) and used in the other link to locate duplicates.

Posted: Thu Dec 03, 2009 6:12 am
by varaprasad
Looks like an interviewer's requirement.

1. Capture the duplicates into a file
2. Do a lookup on this file to remove all the records having duplicates.

You may have to split this into two jobs.

Posted: Thu Dec 03, 2009 2:32 pm
by ray.wurlod
This is a fairly typical "fork join" design with one tine of the fork calculating the count.

When is the interview?

Posted: Thu Dec 03, 2009 4:11 pm
by vinnz
Another possibility may be to (1) Sort by primary key creating a key change column (2) Deduplicate retaining the last record and then (3) filter using keychangecol=1 or keychangecol<>0

Probably inefficient compared to the fork join design though ..