Sequence number to duplicates

sunnymdatastage · Post by **sunnymdatastage** » Mon Oct 28, 2013 1:39 pm

I have a scenario based question. I have an Input ID and want to add a seqID for every duplicate. If there are no duplicates then seqid should be 1 else if duplicate is found then increment seqid by 1. Below is the example of how the input and output may look. I would appreciate if someone could help me on this.
Input

Id
1
1
2
3
3
4
*************************************
Output

Id seqid
1 1
1 2
2 1
3 1
3 2
3 3
4 1

chulett · Post by **chulett** » Mon Oct 28, 2013 2:38 pm

There's at least a couple of ways to do this. First would be more Old School and involves using stage variables to do 'group change detection'. Keep the old value from the previous row and compare it to the current value. Same? Add 1 to your ID. Different? Set your ID to 1. Do an exact search here for that term to see examples.

You can also leverage the optional 'Key Change column' that the Sort stage can add to the output as a new field that will show True (1) when a new group starts and False (0) when it is the same as the previous group value. As before, when it is 1 set your ID to 1, when it is 0 add 1 to your current ID.

Running your job on multiple nodes can complicate this but you just need to hash partition the data on the grouping field(s) to keep everything working.

sunnymdatastage · Post by **sunnymdatastage** » Tue Oct 29, 2013 9:35 am

I need a higher level answer to this please.

ray.wurlod · Post by **ray.wurlod** » Tue Oct 29, 2013 2:05 pm

Short of answering using all upper-case I don't know how any "higher level" might be achieved. Craig's answer seems singularly complete to me.

DSXchange is not here to do your work for you (though some consultants who post here may do so for money). DSXchange will help, as Craig has done.

What other question do you have about what Craig posted (and about which a Search will uncover details of techniques)?