dear fri,
when using this stage, do we must do the sorting in the input link? Means tick the 'perform sort' check box, if we don't tick, will it possibility output the incorrect result ? Please kindly advise, thanks so much.
walter/
sorting in the Aggregator
Moderators: chulett, rschirm, roy
sorting in the Aggregator
wuruimao
-
- Participant
- Posts: 36
- Joined: Thu Dec 03, 2009 9:11 pm
- Location: Malaysia
Re: sorting in the Aggregator
Hi ,
Doesnt matter whether you sort it or not depends on which aggregate function you are using, however it better to give the results in a sorted manner as input to the aggregate function for performance effective.
Thanks & Regards
Naveen
Doesnt matter whether you sort it or not depends on which aggregate function you are using, however it better to give the results in a sorted manner as input to the aggregate function for performance effective.
Thanks & Regards
Naveen
Naveen Kumar
Datastage Consultant
Datastage Consultant
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
You have to use sorted data if the aggregation mode is Sort. The Aggregator makes use of the fact that data are sorted by grouping keys to minimize the amount of memory it needs - it only need to keep one key value in memory.
Hash mode means that the Aggregator has to keep a table in memory with a row for every distinct value of grouping keys. If you estimate the size of this table at 1K per row, this will give you some feel for the amount of memory that that would required.
Hash mode is very suitable when there will be only a small number of distinct groups. Sort mode is highly suitable when there is a large number of distinct groups.
Hash mode is a "blocking operation" - that is, no rows can come out of the Aggregator until all input rows have been consumed. Sort mode is not a blocking operation (except at the individual group level, which is negligible).
Hash mode means that the Aggregator has to keep a table in memory with a row for every distinct value of grouping keys. If you estimate the size of this table at 1K per row, this will give you some feel for the amount of memory that that would required.
Hash mode is very suitable when there will be only a small number of distinct groups. Sort mode is highly suitable when there is a large number of distinct groups.
Hash mode is a "blocking operation" - that is, no rows can come out of the Aggregator until all input rows have been consumed. Sort mode is not a blocking operation (except at the individual group level, which is negligible).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.