Aggregator stage
Moderators: chulett, rschirm, roy
Aggregator stage
Hello everybody,
I am using 7.1 parallel aggregator stage.
There are aaround 30 columns passing through it of which grouping needs to be done on 17 columns and '1' column needs to be summed up. however the rest of the columns (number as well as char) need to be passed just as it is taking first/last value.
My problem is how do I get these remainign columns to pass by as it is. the aggregator stage doenot carry these columns without derivations for them. however giving derivation as max / min alos would not help as these columns are char.
can anyone please help me with this problem.
Thanks
I am using 7.1 parallel aggregator stage.
There are aaround 30 columns passing through it of which grouping needs to be done on 17 columns and '1' column needs to be summed up. however the rest of the columns (number as well as char) need to be passed just as it is taking first/last value.
My problem is how do I get these remainign columns to pass by as it is. the aggregator stage doenot carry these columns without derivations for them. however giving derivation as max / min alos would not help as these columns are char.
can anyone please help me with this problem.
Thanks
Sort the data going into your aggregator stage and then define the derivations for the rest of the columns as first/last based upon your requirement. Also, don't forget to mention in the Aggregator, how the columns are coming in i.e. the sort order of the input columns.
HTH
HTH
Kris
Where's the "Any" key?-Homer Simpson
Where's the "Any" key?-Homer Simpson
-
- Premium Member
- Posts: 353
- Joined: Wed Apr 06, 2005 8:45 am
yes , like Vinay mentioned my problem is different,,
I have the columns sorted and it works well if i take only the grouping columns and the column to be summed. the aggregation is just fine... the problem is how to propogate the columns that are not a part of grouping key as well as need no agg. to be performed on them.
I have the columns sorted and it works well if i take only the grouping columns and the column to be summed. the aggregation is just fine... the problem is how to propogate the columns that are not a part of grouping key as well as need no agg. to be performed on them.
When you are grouping the data by seventeen columns, you are going to get all distinct rows, based on the combination of the seventeen columns. Now, one of the columns is summed up, the derivation is SUM. That's fine till there. Now, for the rest of the columns you are going to get a single value( as you are grouping by). Now you have to know what this value has to be..does it have to be the first or last based upon a date or Max or Min or something like that. and hence the sorting comes into picture. If I still don't make sense, may be an input dataset example from OP might help to explain in a better way.
Kris
Where's the "Any" key?-Homer Simpson
Where's the "Any" key?-Homer Simpson
yes, what you say makes full sense, and yes the other values have to be given a single value, lets say the last value. now how do I do that in the aggregator stage?
using the max function in agg. stage doesnot help as they are char fields. I am not able to see any option like last / first in the aggregator stage or any option that will help me propogate these columns.
Can you please help me on how to propogate these columns ??? (yes with one value, say, last )
Thanks
using the max function in agg. stage doesnot help as they are char fields. I am not able to see any option like last / first in the aggregator stage or any option that will help me propogate these columns.
Can you please help me on how to propogate these columns ??? (yes with one value, say, last )
Thanks
I think there is a problem with the EE aggregator stage when trying to pass char fields through. I think it has been raised with Ascential and they will fix it for the next release. We have used either Transformer stages with bespoke aggregation logic or you can also put a Server aggregator into a shared container and use the shared container in your EE job but i would only use the later if it is not the main flow of data being aggregated because the performance hit will be very large.
Regards,
Nick.
Nick.
The same as been discussed here. But I belive there is a patch already released by IBM for passing string across aggregator stage. Search for more information in this forum.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'