Hi,
I have a question regarding aggregator stage. I am using aggragator stage in most of my jobs and the output also looks good. But is it mandatory to hash partition and sort by the grouping keys in aggregator stage.Currently it is defaulted to auto.
Thanks
Aggregator Partitioning
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 437
- Joined: Fri Oct 21, 2005 10:00 pm
Re: Aggregator Partitioning
I think the question should really be how does auto partitioning work in a job with an aggregator?
First I would ask you to look at your dump score in the job to see what type of partitioining is occurring in the job and where. This will answer your question for you.
Mandatory? No, it is not for all cases neccesary to insert a hash partition. However in some cases, it would be neccesarry. Sorting depends on the aggregator method used:
"Use hash mode for a relatively small number of groups; generally, fewer than about 1000 groups per megabyte of memory. Sort mode requires the input data set to have been partition sorted with all of the grouping keys specified as hashing and sorting keys."
First I would ask you to look at your dump score in the job to see what type of partitioining is occurring in the job and where. This will answer your question for you.
Mandatory? No, it is not for all cases neccesary to insert a hash partition. However in some cases, it would be neccesarry. Sorting depends on the aggregator method used:
"Use hash mode for a relatively small number of groups; generally, fewer than about 1000 groups per megabyte of memory. Sort mode requires the input data set to have been partition sorted with all of the grouping keys specified as hashing and sorting keys."
Keith Williams
keith@peacefieldinc.com
keith@peacefieldinc.com
-
- Participant
- Posts: 152
- Joined: Mon Mar 07, 2011 6:16 am
Re: Aggregator Partitioning
It is better to use hash partition whenever you are working on group of data like aggregator,sort,remove duplicate.
Soumya
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Re: Aggregator Partitioning
That's not always true. For example, if the grouping key is an integer of some kind, then Modulus should be preferred, as it's more efficient than Hash.soumya5891 wrote:It is better to use hash partition whenever you are working on group of data like aggregator,sort,remove duplicate.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.