How does Partitioning work for more than 1 job
Posted: Mon Jul 14, 2008 7:12 am
This doubt is related to partitioning basics. Please let me know the answer or pointer to the answer.
Here is the situation:
I have 2 extract jobs. Both jobs create populate 1 dataset each. Extract 1 fetches 17 million rows. Extract 2 fetches 7 million rows. Both the datasets have a key viz. 'KEY'
Now I have used hash partition on KEY in both the extract jobs. Now can I expect, a particular key value to go in the same partition of both the datasets? If yes, how? (I might run the 2 extract jobs on 2 different days).
(I think the answer is 'No' and I will have to repartition the data in subsequent transform jobs...)
Thanks
Here is the situation:
I have 2 extract jobs. Both jobs create populate 1 dataset each. Extract 1 fetches 17 million rows. Extract 2 fetches 7 million rows. Both the datasets have a key viz. 'KEY'
Now I have used hash partition on KEY in both the extract jobs. Now can I expect, a particular key value to go in the same partition of both the datasets? If yes, how? (I might run the 2 extract jobs on 2 different days).
(I think the answer is 'No' and I will have to repartition the data in subsequent transform jobs...)
Thanks