I have to join two datasets based on keys L1 and D1, where L1 is a one up number and D1 is a date.
I believe that for the join to work, the input partitioning on links should be hashed and sorted on the keys.
My input data is coming as sorted on key L1. Hence, is it possible for me to join on L1 and D1 but using Hash partitioning method on the key L1 alone
![Question :?:](./images/smilies/icon_question.gif)
![Question :?:](./images/smilies/icon_question.gif)
![Question :?:](./images/smilies/icon_question.gif)
My rationale for this question is: Since the hashing algoritham is used on the main key, L1, all the L1 records would have the same hashed key value and would eventually fall in the same node. Then its a matter of sorting on the keys, L1 and D1, to find the join.
I have used the join all through my jobs with the golden rule that the data should be hashed and sorted on the keys. But, when a colleague of mine prompted me as to why we can't do it on the main key and then sort to get the desired result; I was not able to give a proper explanation.
your response is appreciated.