Page 1 of 1

How does Partitioning work for more than 1 job

Posted: Mon Jul 14, 2008 7:12 am
by MVL
This doubt is related to partitioning basics. Please let me know the answer or pointer to the answer.

Here is the situation:
I have 2 extract jobs. Both jobs create populate 1 dataset each. Extract 1 fetches 17 million rows. Extract 2 fetches 7 million rows. Both the datasets have a key viz. 'KEY'

Now I have used hash partition on KEY in both the extract jobs. Now can I expect, a particular key value to go in the same partition of both the datasets? If yes, how? (I might run the 2 extract jobs on 2 different days).
(I think the answer is 'No' and I will have to repartition the data in subsequent transform jobs...)

Thanks

Posted: Mon Jul 14, 2008 8:32 am
by ArndW
The same key using the same hashing algorithm with the same APT_CONFIG file will go to the same partition number on both files.

Posted: Mon Jul 14, 2008 9:47 pm
by MVL
Thanks for reply.

2 jobs would not have same config file. So basically I will need to repartition the 2 datasets in the transform job..

Posted: Mon Jul 14, 2008 10:00 pm
by ray.wurlod
Why would two jobs not have the same configuration file? Are you stating this as a general principle, or as a specific case?

Posted: Mon Jul 14, 2008 10:41 pm
by MVL
Ok. I stated it as a general rule. But by looking at your message I think thats not the case. Can 2 jobs share same RT_CONFIG file? Is this file any way related to hash partitioning?

Thanks

Posted: Mon Jul 14, 2008 10:51 pm
by Minhajuddin
Do you have 2 datasets or are you appending one dataset to the other?
If you have 2 datasets which are not being appended you will have all the same keys in the same partitions.

Posted: Tue Jul 15, 2008 12:45 am
by ray.wurlod
MVL wrote:Ok. I stated it as a general rule. But by looking at your message I think thats not the case. Can 2 jobs share same RT_CONFIG file? Is this file any way related to hash partitioning?
Thanks
The configuration file all of the rest of us have been talking about is the one whose pathname is specified by APT_CONFIG_FILE environment variable, not the RT_CONFIGnnn hashed file in the Repository (which has nothing whatsoever to do with Hash partitioning except that the specification in your design is stored in RT_CONFIGnnn).