Page 1 of 1

Aggregate function 'First' - parallel Aggregator stage

Posted: Mon Aug 27, 2007 6:58 am
by vnspn
Hi,

We are in process of converting our Server jobs into Parallel jobs. There are some Server jobs that use Aggregator and that uses the Aggregate function 'First' in them.

When I convert these jobs to parallel jobs, I see there is no similar property to set, in the parallel Aggregator stage to get the value of a column for the First record in a group.

We might need to know the way to get the First record's column value in a group (when grouping using a parallel Aggregator stage)

Thanks.

Posted: Mon Aug 27, 2007 8:46 am
by JoshGeorge
Try duplicate remove stage with first to retain option on keys.

Posted: Mon Aug 27, 2007 8:56 am
by bkumar103
You can use the remove duplicate stage if the data is already sorted. There you can specify that based on key column whether you want to keep first or last column.
If data is not sorted then, before using the remove duplicate stage, sort stage should used that will sort the data based on the key value.

Posted: Mon Aug 27, 2007 10:06 am
by vnspn
Thanks for your suggestions. But I missed one point to tell. We would need to do 'Sum' and 'Max' on certain columns. That was the reason we use Aggregator here.

In a Server job's aggregator, I could group by on certain columns, then get the 'Sum' and 'Max' for the columns we want and then for the rest of the columns, we use the 'First' to get the values.

But, in parallel job's Aggregator, we have properties to get 'Sum' and 'Max', but we are stranded for rest of the columns where we need to just pass on the 'First' value.

Any thoughts...