Hello Gurus,
in aggregator stage, there are two method:
method "hash" require hashing partition with grouping keys
method "sort" also require hashing partition for input with grouping
keys.
what's the difference for these two methods ?
thanks very much!
methods for aggregator stage
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
The difference is how memory is managed.
HASH method builds a hash table in memory with one row for each combination of grouping values. It can not generate any output rows until all rows have entered the Aggregator stage.
SORT does the same but flushes and frees that memory when any of the sorted columns changes value. It can do that because, since the column is sorted, we know that the previous value will never be seen again. Overall, this uses far less memory than the HASH method for aggregation.
HASH method builds a hash table in memory with one row for each combination of grouping values. It can not generate any output rows until all rows have entered the Aggregator stage.
SORT does the same but flushes and frees that memory when any of the sorted columns changes value. It can do that because, since the column is sorted, we know that the previous value will never be seen again. Overall, this uses far less memory than the HASH method for aggregation.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.