Aggregation

cherry · Post by **cherry** » Tue Jan 05, 2010 3:40 am

Hi All,

Could some one help in following design of parallel job.

I have the following Columns:

Input
------
A B C D
1 878 001 004
1 999 002 003
2 789 005 004
2 996 003 007

My Output should be:

GroupBy A and Sum(B) And First(C) and First(D)

Output
--------

A B C D
1 1877 001 003
2 1785 003 004

Could some one help how do I achieve this logic in Parallel. We have first function in Server jobs and what would be the relevant of it in Parallel Jobs

Best Regards
Cherry

vkhandel · Post by **vkhandel** » Tue Jan 05, 2010 5:26 am

Have you tried using the "Aggregator Stage"?
You can Hash/sort in input of Aggregator Stage on field "A", and then you can specify following properties in the stage ---
Grouping key = A
Aggregation type = calculation
Column for calculation = B
Sum output column = B
Column for calculation = C
Sum output column = C
Column for calculation = D
Sum output column = D

vkhandel · Post by **vkhandel** » Tue Jan 05, 2010 5:27 am

Sorry for the copy/paste error ...
for C and D, you should use the following function -

Column for calculation = C
Minimum Value output column = C
Column for calculation = D
Minimum Value output column = D

DSXchange

Aggregation

Aggregation

Re: Aggregation

Re: Aggregation