Page 1 of 1

Aggregator: Multiple column aggregation into one

Posted: Thu Sep 28, 2006 7:02 pm
by vijayrc
Hi,

For the same group key in an aggregator, would it be possible to aggregate multiple columns in a record into one output column.

e.g Input
Groupkey1,Groupkey2,Groupkey3.....Amt1,Amt2,Amt3....
1111,2222,3333.....$10,$5,$3
1111,2222,3333.....$15,$3,$2
2222,3333,4444.....$1,$2,$3
2222,3333,4444.....$2,$3,$4

Ouput:
1111,2222,3333.....$38 [10+5+3+15+3+2]
2222,3333,4444.....$15 [1+2+3+2+3+$]

Is this possible ?

Posted: Thu Sep 28, 2006 8:37 pm
by DSguru2B
Ofcourse its possible. First do a normal aggregation with the aggregator grouped on the keys. This will leave you with all unique combinations of the keys. Then you can add the columns in the transformer itself into just one column. There you go. Its done.

Posted: Thu Sep 28, 2006 9:37 pm
by kumar_s
Though Aggregator, youc cannot Sum accross Columns. As suggested use Transformer for that case.

Posted: Thu Sep 28, 2006 11:14 pm
by ray.wurlod
This one really needs Aggregator ---> Transformer ---> Aggregator (though the accumulations of the grouped values could be achieved in stage variables in a Transformer stage rather than in the first Aggregator stage).

Posted: Fri Sep 29, 2006 12:41 am
by DSguru2B
Ray, i get the first aggregator which is to do normal aggregations based on the grouped keys.
I get the transformer to get the aggregations of the columns across. But whats the purpose of the last aggregator? :roll:

Posted: Fri Sep 29, 2006 1:17 am
by ray.wurlod
Well spotted, it's not necessary. I was solving a different problem.

The aggregator stage will generate:
Groupkey1,Groupkey2,Groupkey3.....Amt1,Amt2,Amt3....
1111,2222,3333.....$20,$8,$5
2222,3333,4444.....$3,$5,$7


The Transformer stage (adding the columns) will generate:
Groupkey1,Groupkey2,Groupkey3.....AmtTot
1111,2222,3333.....$38
2222,3333,4444.....$15

Make sure that the data are sorted and partitioned on GroupKey1, GroupKey2, GroupKey3 so that all members of each group are on the same processing node.

Posted: Fri Sep 29, 2006 6:19 am
by vijayrc
ray.wurlod wrote:Well spotted, it's not necessary. I was solving a different problem.

The aggregator stage will generate:
Groupkey1,Groupkey2,Groupkey3.....Amt1,Amt2,Amt3....
1111,2222,3333.....$20,$8,$5
2222 ...
Thanks all for suggestions, [though some answers, I can't see in full]. But my question is Would it be possible with Single Aggregator + anything besides Transformer [costly ?] or with Aggregator back-to-back ?

Posted: Fri Sep 29, 2006 10:56 am
by DSguru2B
As pointed out, Aggregator cannot aggregate accross columns. For that you need a transformer. The only other option that i can think of right now is to do it in the sql itself if your source is a database by writing your own sql.

Posted: Fri Sep 29, 2006 3:56 pm
by ray.wurlod
You could sum across columns in the Extraction SQL thereby avoiding the need for a Transformer stage.

Posted: Mon Oct 02, 2006 8:43 pm
by kumar_s
You may find some or other post to deny that transformer cuases Performance degradation. And more over, otherthan adding up this in the source database stage, Transformer will be easieast way to accomplish it.

Posted: Mon Oct 02, 2006 9:13 pm
by DSguru2B
Tell them Kumar man. :twisted: