Page 1 of 1

GET N rows in output based on counts

Posted: Wed Jun 04, 2014 12:48 pm
by sunnymdatastage
Example:
Input:
col1 col2 col3 col4
10 10/01/2011 10/31/2011 2
10 11/01/2011 11/31/2011 2
10 12/01/2011 12/31/2011 2
10 01/01/2012 01/31/2012 2

11 10/01/2011 10/31/2011 5
11 11/01/2011 11/31/2011 5
11 12/01/2011 12/31/2011 5
11 01/01/2012 01/31/2012 5
11 02/01/2011 02/31/2011 5
11 03/01/2011 03/31/2011 5
11 04/01/2011 04/31/2011 5
11 05/01/2012 05/31/2012 5


Output:(based on col4)

col1 col2 col3 col4
10 10/01/2011 10/31/2011 2
10 11/01/2011 11/31/2011 2


11 10/01/2011 10/31/2011 5
11 11/01/2011 11/31/2011 5
11 12/01/2011 12/31/2011 5
11 01/01/2012 01/31/2012 5
11 02/01/2011 02/31/2011 5

I have a scenario shown in the above example. If you check the input,Col 4 has the counts for which in every group of COL1, i need to select the first(col4). So if the COL4 has 2,I will select the first 2 from the COL1 group. If COL4 is 6 I will select first 6 2 from the COL1 group.
Is this possible in a transformer stage??

Posted: Wed Jun 04, 2014 1:12 pm
by ArndW
Yes, this is possible and not particularly difficult to implement.

First, use stage variables to detect group changes in COL1.
If the group has changed, set stage var RowsToSend TO COL4's value,
otherwise decrement the value of COL4 by 1.
Make a Constraint of "RowsToSend>0"

The details and error checking need to be added, but those are the only steps you need perform. Note that there are different ways to do this, what I posted is just an example of how I would approach it.

Posted: Wed Jun 04, 2014 2:25 pm
by ssnegi
Transformer : Hash partition on col1 and sort only ascending on the fields col2 & col3:

create stage variables :
svcnt : if input.col1 <> svcol1 then 1 else svcnt+1
svcol1 : input.col1
svoutput : if svcnt <= input.svcol4 then 0 else 1

Then put constraint :
svoutput = 0

Posted: Wed Jun 04, 2014 3:22 pm
by ray.wurlod
Please do not send private message with the same question unless you are seeking to hire services.

Posted: Thu Jun 05, 2014 1:15 am
by askvishal
Use sort stage (ascending Order) with Row change column- RC.

Then a transformer stage with constraint to output link as :-

RC=1 and @iteration<=Col1
map the input output columns.

Please let me know if it is done succesfully.

Posted: Mon Jun 16, 2014 2:11 pm
by sunnymdatastage
None of these suggestions worked. This is a typical example where I am not able to find any specific suggestion in any forum. I would also like to add that the COL1 may not be always +1, in actual it can be 10,15,45,64 etc

Posted: Mon Jun 16, 2014 2:53 pm
by chulett
You've gotten specific suggestions right here in this thread. Please be more specific with regards to what "did not work" about them. As Arnd noted, this seems pretty straight-forward to me so I'd be curious what issues you had with his approach in particular.

And ps - none of the suggestions assumed COL1 was any kind of "+1", all they asked you to detect was when it changed.

Posted: Mon Jun 16, 2014 2:58 pm
by sunnymdatastage
Suggestion 1 passes 0 records.
suggestion 2 passes all the records that I have in the input.
suggestion 3 passes less number of records the the count I have. However this was much closer but this gives only those records in output where keychange=1.

Posted: Mon Jun 16, 2014 3:32 pm
by chulett
Then I would wager the suggestions were not properly implemented but there's no way to know that without including details of what you actually attempted. Let's stick with "Suggestion 1" for the moment, can you show us the stage variables, their derivations and the constraint you used? You'll also need to partition this properly for it to work, that or run the transformer in sequential mode... or run the job on a single node.

Posted: Tue Jun 17, 2014 6:48 am
by askvishal
Please select all the 4 columns as the key in Sort stage for the suggestion mentioned above and try.

Posted: Tue Jun 17, 2014 6:56 am
by chulett
For that suggestion, only the first column should be the 'key' in the Sort change in order to generate a proper Key Change column.

Posted: Tue Jun 17, 2014 11:11 pm
by askvishal
My bad... ignore the previous suggestion.. the older suggestion should give you the desired result. Please try again