Join partitioning requirement

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Madhavan VM
Participant
Posts: 33
Joined: Sat Jul 02, 2005 2:27 am
Location: Bangalore

Join partitioning requirement

Post by Madhavan VM »

Hi,

I have to join two datasets based on keys L1 and D1, where L1 is a one up number and D1 is a date.

I believe that for the join to work, the input partitioning on links should be hashed and sorted on the keys.

My input data is coming as sorted on key L1. Hence, is it possible for me to join on L1 and D1 but using Hash partitioning method on the key L1 alone :?: To make it clear: I will be doing hash partitioning on the keys L1 and do sorting on L1 and D1. Would my join be succesfull :?: If not could you explain why :?:

My rationale for this question is: Since the hashing algoritham is used on the main key, L1, all the L1 records would have the same hashed key value and would eventually fall in the same node. Then its a matter of sorting on the keys, L1 and D1, to find the join.

I have used the join all through my jobs with the golden rule that the data should be hashed and sorted on the keys. But, when a colleague of mine prompted me as to why we can't do it on the main key and then sort to get the desired result; I was not able to give a proper explanation.

your response is appreciated.
warm regards,
Ajith GK
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You need to partition on both key columns, so that for any value of the key [L1,D1] all corresponding values will be found on the same partition.

That said, if you partition one input on [L1,D1] and one on [L1] alone, then you would still achieve the principle of key adjacency, because there are no values of [L1] on any other partition for any particular set of [L1,D1] values.

So, yes (ceteris paribus) your join will probably work.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply