GET N rows in output based on counts
Moderators: chulett, rschirm, roy
-
- Premium Member
- Posts: 19
- Joined: Tue Jul 16, 2013 10:21 am
GET N rows in output based on counts
Example:
Input:
col1 col2 col3 col4
10 10/01/2011 10/31/2011 2
10 11/01/2011 11/31/2011 2
10 12/01/2011 12/31/2011 2
10 01/01/2012 01/31/2012 2
11 10/01/2011 10/31/2011 5
11 11/01/2011 11/31/2011 5
11 12/01/2011 12/31/2011 5
11 01/01/2012 01/31/2012 5
11 02/01/2011 02/31/2011 5
11 03/01/2011 03/31/2011 5
11 04/01/2011 04/31/2011 5
11 05/01/2012 05/31/2012 5
Output:(based on col4)
col1 col2 col3 col4
10 10/01/2011 10/31/2011 2
10 11/01/2011 11/31/2011 2
11 10/01/2011 10/31/2011 5
11 11/01/2011 11/31/2011 5
11 12/01/2011 12/31/2011 5
11 01/01/2012 01/31/2012 5
11 02/01/2011 02/31/2011 5
I have a scenario shown in the above example. If you check the input,Col 4 has the counts for which in every group of COL1, i need to select the first(col4). So if the COL4 has 2,I will select the first 2 from the COL1 group. If COL4 is 6 I will select first 6 2 from the COL1 group.
Is this possible in a transformer stage??
Input:
col1 col2 col3 col4
10 10/01/2011 10/31/2011 2
10 11/01/2011 11/31/2011 2
10 12/01/2011 12/31/2011 2
10 01/01/2012 01/31/2012 2
11 10/01/2011 10/31/2011 5
11 11/01/2011 11/31/2011 5
11 12/01/2011 12/31/2011 5
11 01/01/2012 01/31/2012 5
11 02/01/2011 02/31/2011 5
11 03/01/2011 03/31/2011 5
11 04/01/2011 04/31/2011 5
11 05/01/2012 05/31/2012 5
Output:(based on col4)
col1 col2 col3 col4
10 10/01/2011 10/31/2011 2
10 11/01/2011 11/31/2011 2
11 10/01/2011 10/31/2011 5
11 11/01/2011 11/31/2011 5
11 12/01/2011 12/31/2011 5
11 01/01/2012 01/31/2012 5
11 02/01/2011 02/31/2011 5
I have a scenario shown in the above example. If you check the input,Col 4 has the counts for which in every group of COL1, i need to select the first(col4). So if the COL4 has 2,I will select the first 2 from the COL1 group. If COL4 is 6 I will select first 6 2 from the COL1 group.
Is this possible in a transformer stage??
Yes, this is possible and not particularly difficult to implement.
First, use stage variables to detect group changes in COL1.
If the group has changed, set stage var RowsToSend TO COL4's value,
otherwise decrement the value of COL4 by 1.
Make a Constraint of "RowsToSend>0"
The details and error checking need to be added, but those are the only steps you need perform. Note that there are different ways to do this, what I posted is just an example of how I would approach it.
First, use stage variables to detect group changes in COL1.
If the group has changed, set stage var RowsToSend TO COL4's value,
otherwise decrement the value of COL4 by 1.
Make a Constraint of "RowsToSend>0"
The details and error checking need to be added, but those are the only steps you need perform. Note that there are different ways to do this, what I posted is just an example of how I would approach it.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Premium Member
- Posts: 19
- Joined: Tue Jul 16, 2013 10:21 am
You've gotten specific suggestions right here in this thread. Please be more specific with regards to what "did not work" about them. As Arnd noted, this seems pretty straight-forward to me so I'd be curious what issues you had with his approach in particular.
And ps - none of the suggestions assumed COL1 was any kind of "+1", all they asked you to detect was when it changed.
And ps - none of the suggestions assumed COL1 was any kind of "+1", all they asked you to detect was when it changed.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Premium Member
- Posts: 19
- Joined: Tue Jul 16, 2013 10:21 am
Then I would wager the suggestions were not properly implemented but there's no way to know that without including details of what you actually attempted. Let's stick with "Suggestion 1" for the moment, can you show us the stage variables, their derivations and the constraint you used? You'll also need to partition this properly for it to work, that or run the transformer in sequential mode... or run the job on a single node.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers