methods for aggregator stage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
cetzhbo
Premium Member
Premium Member
Posts: 38
Joined: Tue Aug 28, 2007 10:20 am

methods for aggregator stage

Post by cetzhbo »

Hello Gurus,

in aggregator stage, there are two method:

method "hash" require hashing partition with grouping keys
method "sort" also require hashing partition for input with grouping
keys.

what's the difference for these two methods ?

thanks very much!
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The difference is how memory is managed.

HASH method builds a hash table in memory with one row for each combination of grouping values. It can not generate any output rows until all rows have entered the Aggregator stage.

SORT does the same but flushes and frees that memory when any of the sorted columns changes value. It can do that because, since the column is sorted, we know that the previous value will never be seen again. Overall, this uses far less memory than the HASH method for aggregation.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply