Sequence number to duplicates

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
sunnymdatastage
Premium Member
Premium Member
Posts: 19
Joined: Tue Jul 16, 2013 10:21 am

Sequence number to duplicates

Post by sunnymdatastage »

I have a scenario based question. I have an Input ID and want to add a seqID for every duplicate. If there are no duplicates then seqid should be 1 else if duplicate is found then increment seqid by 1. Below is the example of how the input and output may look. I would appreciate if someone could help me on this.
Input

Id
1
1
2
3
3
4
*************************************
Output

Id seqid
1 1
1 2
2 1
3 1
3 2
3 3
4 1
sunnym
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

There's at least a couple of ways to do this. First would be more Old School and involves using stage variables to do 'group change detection'. Keep the old value from the previous row and compare it to the current value. Same? Add 1 to your ID. Different? Set your ID to 1. Do an exact search here for that term to see examples.

You can also leverage the optional 'Key Change column' that the Sort stage can add to the output as a new field that will show True (1) when a new group starts and False (0) when it is the same as the previous group value. As before, when it is 1 set your ID to 1, when it is 0 add 1 to your current ID.

Running your job on multiple nodes can complicate this but you just need to hash partition the data on the grouping field(s) to keep everything working.
-craig

"You can never have too many knives" -- Logan Nine Fingers
sunnymdatastage
Premium Member
Premium Member
Posts: 19
Joined: Tue Jul 16, 2013 10:21 am

Post by sunnymdatastage »

I need a higher level answer to this please.
sunnym
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Short of answering using all upper-case I don't know how any "higher level" might be achieved. Craig's answer seems singularly complete to me.

DSXchange is not here to do your work for you (though some consultants who post here may do so for money). DSXchange will help, as Craig has done.

What other question do you have about what Craig posted (and about which a Search will uncover details of techniques)?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply