Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.
Moderators: chulett , rschirm , roy
jzajde1
Premium Member
Posts: 21 Joined: Wed Jan 07, 2015 8:10 am
Post
by jzajde1 » Tue Feb 10, 2015 2:58 pm
Hello,
Is there a way I can retain both duplicates from a stage in DataStage?
The primary key is column 1.
Ex.
Column1|Column2|Column3
111|EA|203
111|EA|201
112|EA|200
113|EA|200
I want to remove both records where column 1 = 111.
Please advise.
Thanks
qt_ky
Premium Member
Posts: 2895 Joined: Wed Aug 03, 2011 6:16 am
Location: USA
Post
by qt_ky » Tue Feb 10, 2015 3:46 pm
Could you clarify if you are wanting to retain (keep) or remove, or something in between, like route one or both to their own separate stage?
Choose a job you love, and you will never have to work a day in your life. - Confucius
ray.wurlod
Participant
Posts: 54607 Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:
Post
by ray.wurlod » Tue Feb 10, 2015 4:24 pm
Create a fork-join to identify the count from each key. Downstream of the Join, create a filter that passes only those key values for which the count is 1.
Last edited by
ray.wurlod on Tue Feb 10, 2015 4:52 pm, edited 1 time in total.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Posts: 43085 Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO
Post
by chulett » Tue Feb 10, 2015 4:48 pm
Yup, fork that join.
-craig
"You can never have too many knives" -- Logan Nine Fingers
jzajde1
Premium Member
Posts: 21 Joined: Wed Jan 07, 2015 8:10 am
Post
by jzajde1 » Wed Feb 11, 2015 6:50 am
qt_ky:
I want to retain(keep) the records and route them to their own stage.
chulett & ray.wurlod: thank you for your post. I will test the fork join and reply.
ShaneMuir
Premium Member
Posts: 508 Joined: Tue Jun 15, 2004 5:00 am
Location: London
Post
by ShaneMuir » Wed Feb 11, 2015 8:19 am
Just as a question, what is the data source in this process? If its a DB there might be ways of avoiding a split fork join by incorporating the identification of potential duplicates into your select query.
jzajde1
Premium Member
Posts: 21 Joined: Wed Jan 07, 2015 8:10 am
Post
by jzajde1 » Wed Feb 11, 2015 9:56 am
ShaneMuir:
The source is a sequential file.