Aggregator Stage Warning.

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

kollurianu
Premium Member
Premium Member
Posts: 614
Joined: Fri Feb 06, 2004 3:59 pm

Aggregator Stage Warning.

Post by kollurianu »

Hi All ,

I am getting the following type warnings for the aggregator stage ..

Aggregator_33,3: Hash table has grown to 16384 entries.

what does that mean?

Any inputs greatly appreciated.

Thank you.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

It means that your hash table has grown to 16384 entries. That is, 16384 different combinations of grouping column values when HASH method is being used. It's meant to alert you to the fact that you're consuming a certain amount of memory, and might therefore consider switching to SORT method. Nothing is broken, but if you end up having very many more combinations of grouping column values, then things might start breaking.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
pandeesh
Premium Member
Premium Member
Posts: 1399
Joined: Sun Oct 24, 2010 5:15 am
Location: CHENNAI, TAMIL NADU

Re: Aggregator Stage Warning.

Post by pandeesh »

This has been discussed so many times in this forum.
Please use search option .
pandeeswaran
pandeesh
Premium Member
Premium Member
Posts: 1399
Joined: Sun Oct 24, 2010 5:15 am
Location: CHENNAI, TAMIL NADU

Post by pandeesh »

There will not be any adverse effects in the result because of switching to Sort method.
Better you can change the hash method to sort.

Ray, Correct me if i am wrong.
pandeeswaran
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Depends what you mean by "adverse effects" I guess.

You should get identical results using either method. With sorted data you'll get them faster through the Aggregator stage, but much of that gain may be taken up sorting the data.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
pandeesh
Premium Member
Premium Member
Posts: 1399
Joined: Sun Oct 24, 2010 5:15 am
Location: CHENNAI, TAMIL NADU

Post by pandeesh »

Regardless of the job design, if we go and simply change the hash to sort method, will there be any impact?

I guess 'no'
pandeeswaran
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

If you change the method to sort without sorting your data (implied by your use of the word "simply"), the impact will be that your job aborts.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
pandeesh
Premium Member
Premium Member
Posts: 1399
Joined: Sun Oct 24, 2010 5:15 am
Location: CHENNAI, TAMIL NADU

Post by pandeesh »

Thanks ray!

For example, if we are grouping in the aggregator stage based on 8 key fields,
Does it mean that the data should be sorted on all the 8 fields as keys in sort stage prior to aggregator stage?

Thanks
pandeeswaran
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Not necessarily in a Sort stage (though I'd advocate that) but it's essential that they're sorted.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
pandeesh
Premium Member
Premium Member
Posts: 1399
Joined: Sun Oct 24, 2010 5:15 am
Location: CHENNAI, TAMIL NADU

Post by pandeesh »

Sort on all the 8 fields?

Even we can manage the burden in sort stage by giving don't sort if previously sorted.

Thanks
pandeeswaran
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Yes, sorted on all eight grouping fields. It doesn't matter how or where you do that sorting.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
pandeesh
Premium Member
Premium Member
Posts: 1399
Joined: Sun Oct 24, 2010 5:15 am
Location: CHENNAI, TAMIL NADU

Post by pandeesh »

Thanks Ray!!

Since we are getting "Hash table has grown upto 16384 entries" warning some times, then what's the necessity of using hash method?

As per my understanding we can go with sort method always, provided records are sorted previously before aggregator stage.

In which occasions using hash method will be a best practice?
pandeeswaran
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

As stated in the Parallel Job Developer Guide, Aggregator's hash method is typically used for a relatively small number of groups (distinct key values). The more groups in your data, the more memory will be used by the operator as it builds it's hash table of group values and calculations.

Hash method allows you to not need to sort your data prior to entering the Aggregator (although you should still partition it). For a large number of groups, pre-sorting the data and using the Sort method in Aggregator may likely be more efficient.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
kollurianu
Premium Member
Premium Member
Posts: 614
Joined: Fri Feb 06, 2004 3:59 pm

Post by kollurianu »

Hi All ,

Even after sorting and partitioning the data on key field (group field)
Method = Sort in aggregator stage is causing below warnings..and the input file big.
Not sure what I am missing?

Aggregator_33: When checking operator: User inserted sort "Sort_22" does not fulfill the sort requirements of the downstream operator "APT_SortedGroup2Operator in Aggregator_33"

Can somebody clarify?

Appreciate your inputs.
pravin1581
Premium Member
Premium Member
Posts: 497
Joined: Sun Dec 17, 2006 11:52 pm
Location: Kolkata
Contact:

Post by pravin1581 »

The order of defining the sort keys and the grouping keys in the Aggregator stage should be same.
Post Reply