Page 1 of 2

Aggregator Stage Warning.

Posted: Tue Jun 28, 2011 5:52 pm
by kollurianu
Hi All ,

I am getting the following type warnings for the aggregator stage ..

Aggregator_33,3: Hash table has grown to 16384 entries.

what does that mean?

Any inputs greatly appreciated.

Thank you.

Posted: Tue Jun 28, 2011 6:22 pm
by ray.wurlod
It means that your hash table has grown to 16384 entries. That is, 16384 different combinations of grouping column values when HASH method is being used. It's meant to alert you to the fact that you're consuming a certain amount of memory, and might therefore consider switching to SORT method. Nothing is broken, but if you end up having very many more combinations of grouping column values, then things might start breaking.

Re: Aggregator Stage Warning.

Posted: Tue Jun 28, 2011 8:49 pm
by pandeesh
This has been discussed so many times in this forum.
Please use search option .

Posted: Tue Jun 28, 2011 8:51 pm
by pandeesh
There will not be any adverse effects in the result because of switching to Sort method.
Better you can change the hash method to sort.

Ray, Correct me if i am wrong.

Posted: Tue Jun 28, 2011 9:02 pm
by ray.wurlod
Depends what you mean by "adverse effects" I guess.

You should get identical results using either method. With sorted data you'll get them faster through the Aggregator stage, but much of that gain may be taken up sorting the data.

Posted: Tue Jun 28, 2011 10:39 pm
by pandeesh
Regardless of the job design, if we go and simply change the hash to sort method, will there be any impact?

I guess 'no'

Posted: Wed Jun 29, 2011 12:20 am
by ray.wurlod
If you change the method to sort without sorting your data (implied by your use of the word "simply"), the impact will be that your job aborts.

Posted: Wed Jun 29, 2011 12:33 am
by pandeesh
Thanks ray!

For example, if we are grouping in the aggregator stage based on 8 key fields,
Does it mean that the data should be sorted on all the 8 fields as keys in sort stage prior to aggregator stage?

Thanks

Posted: Wed Jun 29, 2011 2:14 am
by ray.wurlod
Not necessarily in a Sort stage (though I'd advocate that) but it's essential that they're sorted.

Posted: Wed Jun 29, 2011 2:22 am
by pandeesh
Sort on all the 8 fields?

Even we can manage the burden in sort stage by giving don't sort if previously sorted.

Thanks

Posted: Wed Jun 29, 2011 4:16 am
by ray.wurlod
Yes, sorted on all eight grouping fields. It doesn't matter how or where you do that sorting.

Posted: Wed Jun 29, 2011 4:21 am
by pandeesh
Thanks Ray!!

Since we are getting "Hash table has grown upto 16384 entries" warning some times, then what's the necessity of using hash method?

As per my understanding we can go with sort method always, provided records are sorted previously before aggregator stage.

In which occasions using hash method will be a best practice?

Posted: Wed Jun 29, 2011 11:46 am
by jwiles
As stated in the Parallel Job Developer Guide, Aggregator's hash method is typically used for a relatively small number of groups (distinct key values). The more groups in your data, the more memory will be used by the operator as it builds it's hash table of group values and calculations.

Hash method allows you to not need to sort your data prior to entering the Aggregator (although you should still partition it). For a large number of groups, pre-sorting the data and using the Sort method in Aggregator may likely be more efficient.

Regards,

Posted: Wed Jun 29, 2011 2:15 pm
by kollurianu
Hi All ,

Even after sorting and partitioning the data on key field (group field)
Method = Sort in aggregator stage is causing below warnings..and the input file big.
Not sure what I am missing?

Aggregator_33: When checking operator: User inserted sort "Sort_22" does not fulfill the sort requirements of the downstream operator "APT_SortedGroup2Operator in Aggregator_33"

Can somebody clarify?

Appreciate your inputs.

Posted: Wed Jun 29, 2011 2:39 pm
by pravin1581
The order of defining the sort keys and the grouping keys in the Aggregator stage should be same.