my data is like this
abc|123|1
abc|123|1
abc|123|2
abc|456|1
abc|456|2
abc|456|3
My output should be:
abc|123|1^1
abc|123|1^2
abc|123|2^3
abc|456|1^4
abc|456|2^5
abc|456|3^6
Thanks
Assign number to duplicate records
Moderators: chulett, rschirm, roy
rkashyap, here is data with column header,
Col1 Col2 Col3
abc 123 1
abc 123 1
abc 123 2
abc 456 1
abc 456 2
abc 456 3
Output should be into one column:
Col_output
abc|123|1^1
abc|123|1^2
abc|123|2^3
abc|456|1^4
abc|456|2^5
abc|456|3^6
Requirements: need to find duplicates based on Col1 and give the numbering based on the Col2, Col3 (sort order)..Hope I am clear..pls let me know if you need more info...
Col1 Col2 Col3
abc 123 1
abc 123 1
abc 123 2
abc 456 1
abc 456 2
abc 456 3
Output should be into one column:
Col_output
abc|123|1^1
abc|123|1^2
abc|123|2^3
abc|456|1^4
abc|456|2^5
abc|456|3^6
Requirements: need to find duplicates based on Col1 and give the numbering based on the Col2, Col3 (sort order)..Hope I am clear..pls let me know if you need more info...
In a transformer -
1. Sort incoming data on the key (Hash partitioning).
2. Create a stage variable to act as Counter. Compare value of Col1 for current-row and previous-row and in case of break set value of counter to 1 else increment the counter by 1.
3. Append the incoming columns and Counter and write the same to output.
1. Sort incoming data on the key (Hash partitioning).
2. Create a stage variable to act as Counter. Compare value of Col1 for current-row and previous-row and in case of break set value of counter to 1 else increment the counter by 1.
3. Append the incoming columns and Counter and write the same to output.
Last edited by rkashyap on Fri Apr 04, 2014 10:03 am, edited 1 time in total.
I agree with Andy.
Double-check your output example - right now it doesn't support the assertation that you need to "assign number to duplicate records". They're just numbered sequentially without any apparent regards to duplicates. And then let us know what you've tried so far.
Double-check your output example - right now it doesn't support the assertation that you need to "assign number to duplicate records". They're just numbered sequentially without any apparent regards to duplicates. And then let us know what you've tried so far.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
Actually I have mentioned one set of records as a sample, data looks like this:asorrell wrote:It looks like all you are doing is numbering the rows. You'll have to set the job to execute in single-threaded mode and append @INROWNUM and your separator.
Col1 Col2 Col3
abc 123 1
zzz 234 3
abc 123 1
999 895 1
abc 123 2
abc 456 1
999 895 3
abc 456 2
abc 456 3
zzz 123 1
Output should be into one column:
Col_output
abc|123|1^1
abc|123|1^2
abc|123|2^3
abc|456|1^4
abc|456|2^5
abc|456|3^6
zzz|123|1^1
zzz|234|3^2
999|895|1^1
999|895|3^2
Then as noted, sort the data and have the Sort stage add a Key Change column on "Col1". Then it's a simple matter of checking the value in the Key Change column and when it is a 1, set your counter to 1. When it is a 0, increment your counter by 1.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers