Aggregator: Multiple column aggregation into one
Moderators: chulett, rschirm, roy
Aggregator: Multiple column aggregation into one
Hi,
For the same group key in an aggregator, would it be possible to aggregate multiple columns in a record into one output column.
e.g Input
Groupkey1,Groupkey2,Groupkey3.....Amt1,Amt2,Amt3....
1111,2222,3333.....$10,$5,$3
1111,2222,3333.....$15,$3,$2
2222,3333,4444.....$1,$2,$3
2222,3333,4444.....$2,$3,$4
Ouput:
1111,2222,3333.....$38 [10+5+3+15+3+2]
2222,3333,4444.....$15 [1+2+3+2+3+$]
Is this possible ?
For the same group key in an aggregator, would it be possible to aggregate multiple columns in a record into one output column.
e.g Input
Groupkey1,Groupkey2,Groupkey3.....Amt1,Amt2,Amt3....
1111,2222,3333.....$10,$5,$3
1111,2222,3333.....$15,$3,$2
2222,3333,4444.....$1,$2,$3
2222,3333,4444.....$2,$3,$4
Ouput:
1111,2222,3333.....$38 [10+5+3+15+3+2]
2222,3333,4444.....$15 [1+2+3+2+3+$]
Is this possible ?
Ofcourse its possible. First do a normal aggregation with the aggregator grouped on the keys. This will leave you with all unique combinations of the keys. Then you can add the columns in the transformer itself into just one column. There you go. Its done.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
This one really needs Aggregator ---> Transformer ---> Aggregator (though the accumulations of the grouped values could be achieved in stage variables in a Transformer stage rather than in the first Aggregator stage).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Well spotted, it's not necessary. I was solving a different problem.
The aggregator stage will generate:
Groupkey1,Groupkey2,Groupkey3.....Amt1,Amt2,Amt3....
1111,2222,3333.....$20,$8,$5
2222,3333,4444.....$3,$5,$7
The Transformer stage (adding the columns) will generate:
Groupkey1,Groupkey2,Groupkey3.....AmtTot
1111,2222,3333.....$38
2222,3333,4444.....$15
Make sure that the data are sorted and partitioned on GroupKey1, GroupKey2, GroupKey3 so that all members of each group are on the same processing node.
The aggregator stage will generate:
Groupkey1,Groupkey2,Groupkey3.....Amt1,Amt2,Amt3....
1111,2222,3333.....$20,$8,$5
2222,3333,4444.....$3,$5,$7
The Transformer stage (adding the columns) will generate:
Groupkey1,Groupkey2,Groupkey3.....AmtTot
1111,2222,3333.....$38
2222,3333,4444.....$15
Make sure that the data are sorted and partitioned on GroupKey1, GroupKey2, GroupKey3 so that all members of each group are on the same processing node.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Thanks all for suggestions, [though some answers, I can't see in full]. But my question is Would it be possible with Single Aggregator + anything besides Transformer [costly ?] or with Aggregator back-to-back ?ray.wurlod wrote:Well spotted, it's not necessary. I was solving a different problem.
The aggregator stage will generate:
Groupkey1,Groupkey2,Groupkey3.....Amt1,Amt2,Amt3....
1111,2222,3333.....$20,$8,$5
2222 ...
As pointed out, Aggregator cannot aggregate accross columns. For that you need a transformer. The only other option that i can think of right now is to do it in the sql itself if your source is a database by writing your own sql.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
You may find some or other post to deny that transformer cuases Performance degradation. And more over, otherthan adding up this in the source database stage, Transformer will be easieast way to accomplish it.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'