I have a scenario based question. I have an Input ID and want to add a seqID for every duplicate. If there are no duplicates then seqid should be 1 else if duplicate is found then increment seqid by 1. Below is the example of how the input and output may look. I would appreciate if someone could help me on this.
Input
Id
1
1
2
3
3
4
*************************************
Output
Id seqid
1 1
1 2
2 1
3 1
3 2
3 3
4 1
Sequence number to duplicates
Moderators: chulett, rschirm, roy
-
- Premium Member
- Posts: 19
- Joined: Tue Jul 16, 2013 10:21 am
There's at least a couple of ways to do this. First would be more Old School and involves using stage variables to do 'group change detection'. Keep the old value from the previous row and compare it to the current value. Same? Add 1 to your ID. Different? Set your ID to 1. Do an exact search here for that term to see examples.
You can also leverage the optional 'Key Change column' that the Sort stage can add to the output as a new field that will show True (1) when a new group starts and False (0) when it is the same as the previous group value. As before, when it is 1 set your ID to 1, when it is 0 add 1 to your current ID.
Running your job on multiple nodes can complicate this but you just need to hash partition the data on the grouping field(s) to keep everything working.
You can also leverage the optional 'Key Change column' that the Sort stage can add to the output as a new field that will show True (1) when a new group starts and False (0) when it is the same as the previous group value. As before, when it is 1 set your ID to 1, when it is 0 add 1 to your current ID.
Running your job on multiple nodes can complicate this but you just need to hash partition the data on the grouping field(s) to keep everything working.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Premium Member
- Posts: 19
- Joined: Tue Jul 16, 2013 10:21 am
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Short of answering using all upper-case I don't know how any "higher level" might be achieved. Craig's answer seems singularly complete to me.
DSXchange is not here to do your work for you (though some consultants who post here may do so for money). DSXchange will help, as Craig has done.
What other question do you have about what Craig posted (and about which a Search will uncover details of techniques)?
DSXchange is not here to do your work for you (though some consultants who post here may do so for money). DSXchange will help, as Craig has done.
What other question do you have about what Craig posted (and about which a Search will uncover details of techniques)?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.