Page 1 of 1

partitioning data on key1 and aggregating on key2

Posted: Tue Jan 24, 2006 8:00 pm
by satish_valavala
In a PX job, If data is partitioned on key 1 and then aggregated on key 2, what issues could arise?

Thx
VS

Re: partitioning data on key1 and aggregating on key2

Posted: Tue Jan 24, 2006 10:49 pm
by ray.wurlod
satish_valavala wrote:In a PX job, If data is partitioned on key 1 and then aggregated on key 2, what issues could arise?
Thx
VS
Are you sorting on key 2?

If not, the Aggregator stage will use a lot more memory than otherwise.

Partitioning on other than key2 may mean that some key2 values are on node 1 and some key2 values are on node 2 and so on. That is, you may not get all the key2 values in one group in the final result. This is almost certainly not desirable.

Posted: Wed Jan 25, 2006 9:07 am
by satish_valavala
Thank you all

Posted: Fri Jan 27, 2006 1:07 pm
by kumar_s
Hi,
Just an update...
Once the partion is done on Key1, as long as you sort on both the key (Key1 and Key2), you can aggregate on both the key.

-Kumar