Page 1 of 1

Aggregator Stage

Posted: Tue Mar 11, 2008 11:21 pm
by gauravrb
In Aggregator Stage i am passing five input columns A,B,C,D and E.
I am grouping on A,B,C,D keys. After Grouping i want to retain the First Value of E in the group. Is this possible? If possible which aggregate function to use? because in Aggregator stage i did find any specific function to retain the first value.

Posted: Wed Mar 12, 2008 1:05 am
by MOHAMMAD.ISSAQ
Can you tell your requirements clearly

Posted: Wed Mar 12, 2008 1:25 am
by ray.wurlod
Remove Duplicates stage gives the capability to preserve the first or last in each group.

Posted: Wed Mar 12, 2008 4:01 am
by gauravrb
ray.wurlod wrote:Remove Duplicates stage gives the capability to preserve the first or last in each group. ...
So i will need to use Remove Duplicate Stage along with Aggregator Stage to meet this requirement and this functionality cannot be achieved in Aggregator Stage.

Posted: Wed Mar 12, 2008 4:25 am
by ray.wurlod
INSTEAD of Aggregator stage.

Posted: Wed Mar 12, 2008 5:25 am
by gauravrb
ray.wurlod wrote:INSTEAD of Aggregator stage. ...
Sorry I forgot to mention that i have some other columns F,G on which i need to aggregation for SUM and COUNT. And for Column E i need to retain the First value in that group. So I wanted to find if any such function for retaining the FIRST value is present in Aggregator Stage.

Posted: Wed Mar 12, 2008 5:27 am
by ray.wurlod
Changing the specification risks invalidating the answer.

There is no First or Last set function in the parallel Aggregator stage.

Is that clear now?

Your solution will, therefore, require both stage types. Use a "fork join" design; split the stream into two, run one stream through an Aggregator stage and the other stream through a Remove Duplicates stage, then bring both streams back together in a Join stage.