Page 1 of 1

Assign number to duplicate records

Posted: Fri Apr 04, 2014 9:08 am
by rajkraj
my data is like this

abc|123|1
abc|123|1
abc|123|2
abc|456|1
abc|456|2
abc|456|3

My output should be:

abc|123|1^1
abc|123|1^2
abc|123|2^3
abc|456|1^4
abc|456|2^5
abc|456|3^6

Thanks

Posted: Fri Apr 04, 2014 9:29 am
by rkashyap
Please add column headers to the data, define key for identifying duplicates and explain the requirements.

Posted: Fri Apr 04, 2014 9:48 am
by rajkraj
rkashyap, here is data with column header,

Col1 Col2 Col3

abc 123 1
abc 123 1
abc 123 2
abc 456 1
abc 456 2
abc 456 3

Output should be into one column:

Col_output
abc|123|1^1
abc|123|1^2
abc|123|2^3
abc|456|1^4
abc|456|2^5
abc|456|3^6

Requirements: need to find duplicates based on Col1 and give the numbering based on the Col2, Col3 (sort order)..Hope I am clear..pls let me know if you need more info...

Posted: Fri Apr 04, 2014 9:55 am
by asorrell
It looks like all you are doing is numbering the rows. You'll have to set the job to execute in single-threaded mode and append @INROWNUM and your separator.

Posted: Fri Apr 04, 2014 10:02 am
by rkashyap
In a transformer -

1. Sort incoming data on the key (Hash partitioning).
2. Create a stage variable to act as Counter. Compare value of Col1 for current-row and previous-row and in case of break set value of counter to 1 else increment the counter by 1.
3. Append the incoming columns and Counter and write the same to output.

Posted: Fri Apr 04, 2014 10:02 am
by chulett
I agree with Andy.

Double-check your output example - right now it doesn't support the assertation that you need to "assign number to duplicate records". They're just numbered sequentially without any apparent regards to duplicates. And then let us know what you've tried so far.

Posted: Fri Apr 04, 2014 10:20 am
by rajkraj
asorrell wrote:It looks like all you are doing is numbering the rows. You'll have to set the job to execute in single-threaded mode and append @INROWNUM and your separator.
Actually I have mentioned one set of records as a sample, data looks like this:

Col1 Col2 Col3

abc 123 1
zzz 234 3
abc 123 1
999 895 1
abc 123 2
abc 456 1
999 895 3
abc 456 2
abc 456 3
zzz 123 1

Output should be into one column:

Col_output
abc|123|1^1
abc|123|1^2
abc|123|2^3
abc|456|1^4
abc|456|2^5
abc|456|3^6

zzz|123|1^1
zzz|234|3^2

999|895|1^1
999|895|3^2

Posted: Fri Apr 04, 2014 10:52 am
by chulett
Then as noted, sort the data and have the Sort stage add a Key Change column on "Col1". Then it's a simple matter of checking the value in the Key Change column and when it is a 1, set your counter to 1. When it is a 0, increment your counter by 1.