Assign number to duplicate records

rajkraj · Post by **rajkraj** » Fri Apr 04, 2014 9:08 am

my data is like this

abc|123|1
abc|123|1
abc|123|2
abc|456|1
abc|456|2
abc|456|3

My output should be:

abc|123|1^1
abc|123|1^2
abc|123|2^3
abc|456|1^4
abc|456|2^5
abc|456|3^6

Thanks

rkashyap · Post by **rkashyap** » Fri Apr 04, 2014 9:29 am

Please add column headers to the data, define key for identifying duplicates and explain the requirements.

rajkraj · Post by **rajkraj** » Fri Apr 04, 2014 9:48 am

rkashyap, here is data with column header,

Col1 Col2 Col3

abc 123 1
abc 123 1
abc 123 2
abc 456 1
abc 456 2
abc 456 3

Output should be into one column:

Col_output
abc|123|1^1
abc|123|1^2
abc|123|2^3
abc|456|1^4
abc|456|2^5
abc|456|3^6

Requirements: need to find duplicates based on Col1 and give the numbering based on the Col2, Col3 (sort order)..Hope I am clear..pls let me know if you need more info...

IBM Analytics Champion 2009 - 2020 · Post by **asorrell** » Fri Apr 04, 2014 9:55 am

It looks like all you are doing is numbering the rows. You'll have to set the job to execute in single-threaded mode and append @INROWNUM and your separator.

rkashyap · Post by **rkashyap** » Fri Apr 04, 2014 10:02 am

In a transformer -

1. Sort incoming data on the key (Hash partitioning).
2. Create a stage variable to act as Counter. Compare value of Col1 for current-row and previous-row and in case of break set value of counter to 1 else increment the counter by 1.
3. Append the incoming columns and Counter and write the same to output.

chulett · Post by **chulett** » Fri Apr 04, 2014 10:02 am

I agree with Andy.

Double-check your output example - right now it doesn't support the assertation that you need to "assign number to duplicate records". They're just numbered sequentially without any apparent regards to duplicates. And then let us know what you've tried so far.

rajkraj · Post by **rajkraj** » Fri Apr 04, 2014 10:20 am

asorrell wrote:It looks like all you are doing is numbering the rows. You'll have to set the job to execute in single-threaded mode and append @INROWNUM and your separator.

Actually I have mentioned one set of records as a sample, data looks like this:

Col1 Col2 Col3

abc 123 1
zzz 234 3
abc 123 1
999 895 1
abc 123 2
abc 456 1
999 895 3
abc 456 2
abc 456 3
zzz 123 1

Output should be into one column:

Col_output
abc|123|1^1
abc|123|1^2
abc|123|2^3
abc|456|1^4
abc|456|2^5
abc|456|3^6

zzz|123|1^1
zzz|234|3^2

999|895|1^1
999|895|3^2

chulett · Post by **chulett** » Fri Apr 04, 2014 10:52 am

Then as noted, sort the data and have the Sort stage add a Key Change column on "Col1". Then it's a simple matter of checking the value in the Key Change column and when it is a 1, set your counter to 1. When it is a 0, increment your counter by 1.