Hi,
We are in process of converting our Server jobs into Parallel jobs. There are some Server jobs that use Aggregator and that uses the Aggregate function 'First' in them.
When I convert these jobs to parallel jobs, I see there is no similar property to set, in the parallel Aggregator stage to get the value of a column for the First record in a group.
We might need to know the way to get the First record's column value in a group (when grouping using a parallel Aggregator stage)
Thanks.
Aggregate function 'First' - parallel Aggregator stage
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 612
- Joined: Thu May 03, 2007 4:59 am
- Location: Melbourne
Try duplicate remove stage with first to retain option on keys.
Joshy George
<a href="http://www.linkedin.com/in/joshygeorge1" ><img src="http://www.linkedin.com/img/webpromo/bt ... _80x15.gif" width="80" height="15" border="0"></a>
<a href="http://www.linkedin.com/in/joshygeorge1" ><img src="http://www.linkedin.com/img/webpromo/bt ... _80x15.gif" width="80" height="15" border="0"></a>
You can use the remove duplicate stage if the data is already sorted. There you can specify that based on key column whether you want to keep first or last column.
If data is not sorted then, before using the remove duplicate stage, sort stage should used that will sort the data based on the key value.
If data is not sorted then, before using the remove duplicate stage, sort stage should used that will sort the data based on the key value.
Thanks for your suggestions. But I missed one point to tell. We would need to do 'Sum' and 'Max' on certain columns. That was the reason we use Aggregator here.
In a Server job's aggregator, I could group by on certain columns, then get the 'Sum' and 'Max' for the columns we want and then for the rest of the columns, we use the 'First' to get the values.
But, in parallel job's Aggregator, we have properties to get 'Sum' and 'Max', but we are stranded for rest of the columns where we need to just pass on the 'First' value.
Any thoughts...
In a Server job's aggregator, I could group by on certain columns, then get the 'Sum' and 'Max' for the columns we want and then for the rest of the columns, we use the 'First' to get the values.
But, in parallel job's Aggregator, we have properties to get 'Sum' and 'Max', but we are stranded for rest of the columns where we need to just pass on the 'First' value.
Any thoughts...