Assign number to duplicate records

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
rajkraj
Premium Member
Premium Member
Posts: 98
Joined: Wed Jun 15, 2005 1:41 pm

Assign number to duplicate records

Post by rajkraj »

my data is like this

abc|123|1
abc|123|1
abc|123|2
abc|456|1
abc|456|2
abc|456|3

My output should be:

abc|123|1^1
abc|123|1^2
abc|123|2^3
abc|456|1^4
abc|456|2^5
abc|456|3^6

Thanks
rkashyap
Premium Member
Premium Member
Posts: 532
Joined: Fri Dec 02, 2011 12:02 pm
Location: Richmond VA

Post by rkashyap »

Please add column headers to the data, define key for identifying duplicates and explain the requirements.
rajkraj
Premium Member
Premium Member
Posts: 98
Joined: Wed Jun 15, 2005 1:41 pm

Post by rajkraj »

rkashyap, here is data with column header,

Col1 Col2 Col3

abc 123 1
abc 123 1
abc 123 2
abc 456 1
abc 456 2
abc 456 3

Output should be into one column:

Col_output
abc|123|1^1
abc|123|1^2
abc|123|2^3
abc|456|1^4
abc|456|2^5
abc|456|3^6

Requirements: need to find duplicates based on Col1 and give the numbering based on the Col2, Col3 (sort order)..Hope I am clear..pls let me know if you need more info...
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

It looks like all you are doing is numbering the rows. You'll have to set the job to execute in single-threaded mode and append @INROWNUM and your separator.
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
rkashyap
Premium Member
Premium Member
Posts: 532
Joined: Fri Dec 02, 2011 12:02 pm
Location: Richmond VA

Post by rkashyap »

In a transformer -

1. Sort incoming data on the key (Hash partitioning).
2. Create a stage variable to act as Counter. Compare value of Col1 for current-row and previous-row and in case of break set value of counter to 1 else increment the counter by 1.
3. Append the incoming columns and Counter and write the same to output.
Last edited by rkashyap on Fri Apr 04, 2014 10:03 am, edited 1 time in total.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I agree with Andy.

Double-check your output example - right now it doesn't support the assertation that you need to "assign number to duplicate records". They're just numbered sequentially without any apparent regards to duplicates. And then let us know what you've tried so far.
-craig

"You can never have too many knives" -- Logan Nine Fingers
rajkraj
Premium Member
Premium Member
Posts: 98
Joined: Wed Jun 15, 2005 1:41 pm

Post by rajkraj »

asorrell wrote:It looks like all you are doing is numbering the rows. You'll have to set the job to execute in single-threaded mode and append @INROWNUM and your separator.
Actually I have mentioned one set of records as a sample, data looks like this:

Col1 Col2 Col3

abc 123 1
zzz 234 3
abc 123 1
999 895 1
abc 123 2
abc 456 1
999 895 3
abc 456 2
abc 456 3
zzz 123 1

Output should be into one column:

Col_output
abc|123|1^1
abc|123|1^2
abc|123|2^3
abc|456|1^4
abc|456|2^5
abc|456|3^6

zzz|123|1^1
zzz|234|3^2

999|895|1^1
999|895|3^2
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Then as noted, sort the data and have the Sort stage add a Key Change column on "Col1". Then it's a simple matter of checking the value in the Key Change column and when it is a 1, set your counter to 1. When it is a 0, increment your counter by 1.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply