To get only unique records

SagarMelam · Post by **SagarMelam** » Thu Dec 03, 2009 5:36 am

Hi ,
Our requirment is that suppose we are getting values 1,1,1,2,3,4,5 for the primary key then we should only populate 2,3,4,5 which are unique.how can we impement this scenario in datastage.

Regards,
Sagar

Sainath.Srinivasan · Post by **Sainath.Srinivasan** » Thu Dec 03, 2009 5:39 am

What is the source ? Why do you need to do this ?

Maybe it is simple in source.

SagarMelam · Post by **SagarMelam** » Thu Dec 03, 2009 5:49 am

The source is an XML file and that is the client's requirment

gssr · Post by **gssr** » Thu Dec 03, 2009 6:04 am

Hope this work,
Aggregate by the key with the Count option and then filter it using the count value which is greater than 1

Sainath.Srinivasan · Post by **Sainath.Srinivasan** » Thu Dec 03, 2009 6:05 am

Split the data into two links with one link doing an aggregation to get record count (for record count > 1) and used in the other link to locate duplicates.

varaprasad · Post by **varaprasad** » Thu Dec 03, 2009 6:12 am

Looks like an interviewer's requirement.

1. Capture the duplicates into a file
2. Do a lookup on this file to remove all the records having duplicates.

You may have to split this into two jobs.

ray.wurlod · Post by **ray.wurlod** » Thu Dec 03, 2009 2:32 pm

This is a fairly typical "fork join" design with one tine of the fork calculating the count.

When is the interview?

vinnz · Post by **vinnz** » Thu Dec 03, 2009 4:11 pm

Another possibility may be to (1) Sort by primary key creating a key change column (2) Deduplicate retaining the last record and then (3) filter using keychangecol=1 or keychangecol<>0

Probably inefficient compared to the fork join design though ..