sorting in the Aggregator

wuruima · Post by **wuruima** » Mon Oct 12, 2015 7:47 pm

dear fri,

when using this stage, do we must do the sorting in the input link? Means tick the 'perform sort' check box, if we don't tick, will it possibility output the incorrect result ? Please kindly advise, thanks so much.

walter/

naveenkumar.ssn · Post by **naveenkumar.ssn** » Mon Oct 12, 2015 9:00 pm

Hi ,

Doesnt matter whether you sort it or not depends on which aggregate function you are using, however it better to give the results in a sorted manner as input to the aggregate function for performance effective.

Thanks & Regards
Naveen

ray.wurlod · Post by **ray.wurlod** » Mon Oct 12, 2015 10:22 pm

You have to use sorted data if the aggregation mode is Sort. The Aggregator makes use of the fact that data are sorted by grouping keys to minimize the amount of memory it needs - it only need to keep one key value in memory.

Hash mode means that the Aggregator has to keep a table in memory with a row for every distinct value of grouping keys. If you estimate the size of this table at 1K per row, this will give you some feel for the amount of memory that that would required.

Hash mode is very suitable when there will be only a small number of distinct groups. Sort mode is highly suitable when there is a large number of distinct groups.

Hash mode is a "blocking operation" - that is, no rows can come out of the Aggregator until all input rows have been consumed. Sort mode is not a blocking operation (except at the individual group level, which is negligible).

DSXchange

sorting in the Aggregator

sorting in the Aggregator

Re: sorting in the Aggregator