Remove Duplicates - Retain both Duplicates

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
jzajde1
Premium Member
Premium Member
Posts: 21
Joined: Wed Jan 07, 2015 8:10 am

Remove Duplicates - Retain both Duplicates

Post by jzajde1 »

Hello,

Is there a way I can retain both duplicates from a stage in DataStage?
The primary key is column 1.

Ex.

Column1|Column2|Column3

111|EA|203
111|EA|201
112|EA|200
113|EA|200

I want to remove both records where column 1 = 111.

Please advise.

Thanks
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

Could you clarify if you are wanting to retain (keep) or remove, or something in between, like route one or both to their own separate stage?
Choose a job you love, and you will never have to work a day in your life. - Confucius
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Create a fork-join to identify the count from each key. Downstream of the Join, create a filter that passes only those key values for which the count is 1.
Last edited by ray.wurlod on Tue Feb 10, 2015 4:52 pm, edited 1 time in total.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Yup, fork that join. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
jzajde1
Premium Member
Premium Member
Posts: 21
Joined: Wed Jan 07, 2015 8:10 am

Post by jzajde1 »

qt_ky:

I want to retain(keep) the records and route them to their own stage.

chulett & ray.wurlod: thank you for your post. I will test the fork join and reply.
ShaneMuir
Premium Member
Premium Member
Posts: 508
Joined: Tue Jun 15, 2004 5:00 am
Location: London

Post by ShaneMuir »

Just as a question, what is the data source in this process? If its a DB there might be ways of avoiding a split fork join by incorporating the identification of potential duplicates into your select query.
jzajde1
Premium Member
Premium Member
Posts: 21
Joined: Wed Jan 07, 2015 8:10 am

Post by jzajde1 »

ShaneMuir:

The source is a sequential file.
Post Reply