How does Partitioning work for more than 1 job

MVL · Post by **MVL** » Mon Jul 14, 2008 7:12 am

This doubt is related to partitioning basics. Please let me know the answer or pointer to the answer.

Here is the situation:
I have 2 extract jobs. Both jobs create populate 1 dataset each. Extract 1 fetches 17 million rows. Extract 2 fetches 7 million rows. Both the datasets have a key viz. 'KEY'

Now I have used hash partition on KEY in both the extract jobs. Now can I expect, a particular key value to go in the same partition of both the datasets? If yes, how? (I might run the 2 extract jobs on 2 different days).
(I think the answer is 'No' and I will have to repartition the data in subsequent transform jobs...)

Thanks

ArndW · Post by **ArndW** » Mon Jul 14, 2008 8:32 am

The same key using the same hashing algorithm with the same APT_CONFIG file will go to the same partition number on both files.

MVL · Post by **MVL** » Mon Jul 14, 2008 9:47 pm

Thanks for reply.

2 jobs would not have same config file. So basically I will need to repartition the 2 datasets in the transform job..

ray.wurlod · Post by **ray.wurlod** » Mon Jul 14, 2008 10:00 pm

Why would two jobs not have the same configuration file? Are you stating this as a general principle, or as a specific case?

MVL · Post by **MVL** » Mon Jul 14, 2008 10:41 pm

Ok. I stated it as a general rule. But by looking at your message I think thats not the case. Can 2 jobs share same RT_CONFIG file? Is this file any way related to hash partitioning?

Thanks

Minhajuddin · Post by **Minhajuddin** » Mon Jul 14, 2008 10:51 pm

Do you have 2 datasets or are you appending one dataset to the other?
If you have 2 datasets which are not being appended you will have all the same keys in the same partitions.

ray.wurlod · Post by **ray.wurlod** » Tue Jul 15, 2008 12:45 am

MVL wrote:Ok. I stated it as a general rule. But by looking at your message I think thats not the case. Can 2 jobs share same RT_CONFIG file? Is this file any way related to hash partitioning?
Thanks

The configuration file all of the rest of us have been talking about is the one whose pathname is specified by APT_CONFIG_FILE environment variable, not the RT_CONFIGnnn hashed file in the Repository (which has nothing whatsoever to do with Hash partitioning except that the specification in your design is stored in RT_CONFIGnnn).