partitioning data on key1 and aggregating on key2

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
satish_valavala
Participant
Posts: 123
Joined: Wed May 18, 2005 7:41 am
Location: USA

partitioning data on key1 and aggregating on key2

Post by satish_valavala »

In a PX job, If data is partitioned on key 1 and then aggregated on key 2, what issues could arise?

Thx
VS
Regards
VS
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Re: partitioning data on key1 and aggregating on key2

Post by ray.wurlod »

satish_valavala wrote:In a PX job, If data is partitioned on key 1 and then aggregated on key 2, what issues could arise?
Thx
VS
Are you sorting on key 2?

If not, the Aggregator stage will use a lot more memory than otherwise.

Partitioning on other than key2 may mean that some key2 values are on node 1 and some key2 values are on node 2 and so on. That is, you may not get all the key2 values in one group in the final result. This is almost certainly not desirable.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
satish_valavala
Participant
Posts: 123
Joined: Wed May 18, 2005 7:41 am
Location: USA

Post by satish_valavala »

Thank you all
Regards
VS
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Hi,
Just an update...
Once the partion is done on Key1, as long as you sort on both the key (Key1 and Key2), you can aggregate on both the key.

-Kumar
Post Reply