Aggregator Partitioning

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
kpsita
Participant
Posts: 99
Joined: Tue Jul 21, 2009 11:43 pm

Aggregator Partitioning

Post by kpsita »

Hi,

I have a question regarding aggregator stage. I am using aggragator stage in most of my jobs and the output also looks good. But is it mandatory to hash partition and sort by the grouping keys in aggregator stage.Currently it is defaulted to auto.

Thanks
KPSITA
kwwilliams
Participant
Posts: 437
Joined: Fri Oct 21, 2005 10:00 pm

Re: Aggregator Partitioning

Post by kwwilliams »

I think the question should really be how does auto partitioning work in a job with an aggregator?

First I would ask you to look at your dump score in the job to see what type of partitioining is occurring in the job and where. This will answer your question for you.

Mandatory? No, it is not for all cases neccesary to insert a hash partition. However in some cases, it would be neccesarry. Sorting depends on the aggregator method used:

"Use hash mode for a relatively small number of groups; generally, fewer than about 1000 groups per megabyte of memory. Sort mode requires the input data set to have been partition sorted with all of the grouping keys specified as hashing and sorting keys."
soumya5891
Participant
Posts: 152
Joined: Mon Mar 07, 2011 6:16 am

Re: Aggregator Partitioning

Post by soumya5891 »

It is better to use hash partition whenever you are working on group of data like aggregator,sort,remove duplicate.
Soumya
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Re: Aggregator Partitioning

Post by ray.wurlod »

soumya5891 wrote:It is better to use hash partition whenever you are working on group of data like aggregator,sort,remove duplicate.
That's not always true. For example, if the grouping key is an integer of some kind, then Modulus should be preferred, as it's more efficient than Hash.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply