sorting in the Aggregator

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
wuruima
Participant
Posts: 65
Joined: Mon Nov 04, 2013 10:15 pm

sorting in the Aggregator

Post by wuruima »

dear fri,

when using this stage, do we must do the sorting in the input link? Means tick the 'perform sort' check box, if we don't tick, will it possibility output the incorrect result ? Please kindly advise, thanks so much.

walter/
wuruimao
naveenkumar.ssn
Participant
Posts: 36
Joined: Thu Dec 03, 2009 9:11 pm
Location: Malaysia

Re: sorting in the Aggregator

Post by naveenkumar.ssn »

Hi ,

Doesnt matter whether you sort it or not depends on which aggregate function you are using, however it better to give the results in a sorted manner as input to the aggregate function for performance effective.

Thanks & Regards
Naveen
Naveen Kumar
Datastage Consultant
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You have to use sorted data if the aggregation mode is Sort. The Aggregator makes use of the fact that data are sorted by grouping keys to minimize the amount of memory it needs - it only need to keep one key value in memory.

Hash mode means that the Aggregator has to keep a table in memory with a row for every distinct value of grouping keys. If you estimate the size of this table at 1K per row, this will give you some feel for the amount of memory that that would required.

Hash mode is very suitable when there will be only a small number of distinct groups. Sort mode is highly suitable when there is a large number of distinct groups.

Hash mode is a "blocking operation" - that is, no rows can come out of the Aggregator until all input rows have been consumed. Sort mode is not a blocking operation (except at the individual group level, which is negligible).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply