partitioning data on key1 and aggregating on key2

satish_valavala · Post by **satish_valavala** » Tue Jan 24, 2006 8:00 pm

In a PX job, If data is partitioned on key 1 and then aggregated on key 2, what issues could arise?

Thx
VS

ray.wurlod · Post by **ray.wurlod** » Tue Jan 24, 2006 10:49 pm

satish_valavala wrote:In a PX job, If data is partitioned on key 1 and then aggregated on key 2, what issues could arise?
Thx
VS

Are you sorting on key 2?

If not, the Aggregator stage will use a lot more memory than otherwise.

Partitioning on other than key2 may mean that some key2 values are on node 1 and some key2 values are on node 2 and so on. That is, you may not get all the key2 values in one group in the final result. This is almost certainly not desirable.

satish_valavala · Post by **satish_valavala** » Wed Jan 25, 2006 9:07 am

Thank you all

kumar_s · Post by **kumar_s** » Fri Jan 27, 2006 1:07 pm

Hi,
Just an update...
Once the partion is done on Key1, as long as you sort on both the key (Key1 and Key2), you can aggregate on both the key.

-Kumar

DSXchange

partitioning data on key1 and aggregating on key2

partitioning data on key1 and aggregating on key2

Re: partitioning data on key1 and aggregating on key2