GET N rows in output based on counts

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
sunnymdatastage
Premium Member
Premium Member
Posts: 19
Joined: Tue Jul 16, 2013 10:21 am

GET N rows in output based on counts

Post by sunnymdatastage »

Example:
Input:
col1 col2 col3 col4
10 10/01/2011 10/31/2011 2
10 11/01/2011 11/31/2011 2
10 12/01/2011 12/31/2011 2
10 01/01/2012 01/31/2012 2

11 10/01/2011 10/31/2011 5
11 11/01/2011 11/31/2011 5
11 12/01/2011 12/31/2011 5
11 01/01/2012 01/31/2012 5
11 02/01/2011 02/31/2011 5
11 03/01/2011 03/31/2011 5
11 04/01/2011 04/31/2011 5
11 05/01/2012 05/31/2012 5


Output:(based on col4)

col1 col2 col3 col4
10 10/01/2011 10/31/2011 2
10 11/01/2011 11/31/2011 2


11 10/01/2011 10/31/2011 5
11 11/01/2011 11/31/2011 5
11 12/01/2011 12/31/2011 5
11 01/01/2012 01/31/2012 5
11 02/01/2011 02/31/2011 5

I have a scenario shown in the above example. If you check the input,Col 4 has the counts for which in every group of COL1, i need to select the first(col4). So if the COL4 has 2,I will select the first 2 from the COL1 group. If COL4 is 6 I will select first 6 2 from the COL1 group.
Is this possible in a transformer stage??
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Yes, this is possible and not particularly difficult to implement.

First, use stage variables to detect group changes in COL1.
If the group has changed, set stage var RowsToSend TO COL4's value,
otherwise decrement the value of COL4 by 1.
Make a Constraint of "RowsToSend>0"

The details and error checking need to be added, but those are the only steps you need perform. Note that there are different ways to do this, what I posted is just an example of how I would approach it.
ssnegi
Participant
Posts: 138
Joined: Thu Nov 15, 2007 4:17 am
Location: Sydney, Australia

Post by ssnegi »

Transformer : Hash partition on col1 and sort only ascending on the fields col2 & col3:

create stage variables :
svcnt : if input.col1 <> svcol1 then 1 else svcnt+1
svcol1 : input.col1
svoutput : if svcnt <= input.svcol4 then 0 else 1

Then put constraint :
svoutput = 0
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Please do not send private message with the same question unless you are seeking to hire services.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
askvishal
Participant
Posts: 8
Joined: Wed Oct 23, 2013 5:52 am
Location: Chennai

Post by askvishal »

Use sort stage (ascending Order) with Row change column- RC.

Then a transformer stage with constraint to output link as :-

RC=1 and @iteration<=Col1
map the input output columns.

Please let me know if it is done succesfully.
sunnymdatastage
Premium Member
Premium Member
Posts: 19
Joined: Tue Jul 16, 2013 10:21 am

Post by sunnymdatastage »

None of these suggestions worked. This is a typical example where I am not able to find any specific suggestion in any forum. I would also like to add that the COL1 may not be always +1, in actual it can be 10,15,45,64 etc
sunnym
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

You've gotten specific suggestions right here in this thread. Please be more specific with regards to what "did not work" about them. As Arnd noted, this seems pretty straight-forward to me so I'd be curious what issues you had with his approach in particular.

And ps - none of the suggestions assumed COL1 was any kind of "+1", all they asked you to detect was when it changed.
-craig

"You can never have too many knives" -- Logan Nine Fingers
sunnymdatastage
Premium Member
Premium Member
Posts: 19
Joined: Tue Jul 16, 2013 10:21 am

Post by sunnymdatastage »

Suggestion 1 passes 0 records.
suggestion 2 passes all the records that I have in the input.
suggestion 3 passes less number of records the the count I have. However this was much closer but this gives only those records in output where keychange=1.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Then I would wager the suggestions were not properly implemented but there's no way to know that without including details of what you actually attempted. Let's stick with "Suggestion 1" for the moment, can you show us the stage variables, their derivations and the constraint you used? You'll also need to partition this properly for it to work, that or run the transformer in sequential mode... or run the job on a single node.
-craig

"You can never have too many knives" -- Logan Nine Fingers
askvishal
Participant
Posts: 8
Joined: Wed Oct 23, 2013 5:52 am
Location: Chennai

Post by askvishal »

Please select all the 4 columns as the key in Sort stage for the suggestion mentioned above and try.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

For that suggestion, only the first column should be the 'key' in the Sort change in order to generate a proper Key Change column.
-craig

"You can never have too many knives" -- Logan Nine Fingers
askvishal
Participant
Posts: 8
Joined: Wed Oct 23, 2013 5:52 am
Location: Chennai

Post by askvishal »

My bad... ignore the previous suggestion.. the older suggestion should give you the desired result. Please try again
Post Reply