How does Partitioning work for more than 1 job

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
MVL
Premium Member
Premium Member
Posts: 33
Joined: Wed Apr 30, 2008 5:43 am

How does Partitioning work for more than 1 job

Post by MVL »

This doubt is related to partitioning basics. Please let me know the answer or pointer to the answer.

Here is the situation:
I have 2 extract jobs. Both jobs create populate 1 dataset each. Extract 1 fetches 17 million rows. Extract 2 fetches 7 million rows. Both the datasets have a key viz. 'KEY'

Now I have used hash partition on KEY in both the extract jobs. Now can I expect, a particular key value to go in the same partition of both the datasets? If yes, how? (I might run the 2 extract jobs on 2 different days).
(I think the answer is 'No' and I will have to repartition the data in subsequent transform jobs...)

Thanks
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

The same key using the same hashing algorithm with the same APT_CONFIG file will go to the same partition number on both files.
MVL
Premium Member
Premium Member
Posts: 33
Joined: Wed Apr 30, 2008 5:43 am

Post by MVL »

Thanks for reply.

2 jobs would not have same config file. So basically I will need to repartition the 2 datasets in the transform job..
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Why would two jobs not have the same configuration file? Are you stating this as a general principle, or as a specific case?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
MVL
Premium Member
Premium Member
Posts: 33
Joined: Wed Apr 30, 2008 5:43 am

Post by MVL »

Ok. I stated it as a general rule. But by looking at your message I think thats not the case. Can 2 jobs share same RT_CONFIG file? Is this file any way related to hash partitioning?

Thanks
Minhajuddin
Participant
Posts: 467
Joined: Tue Mar 20, 2007 6:36 am
Location: Chennai
Contact:

Post by Minhajuddin »

Do you have 2 datasets or are you appending one dataset to the other?
If you have 2 datasets which are not being appended you will have all the same keys in the same partitions.
Minhajuddin

<a href="http://feeds.feedburner.com/~r/MyExperi ... ~6/2"><img src="http://feeds.feedburner.com/MyExperienc ... lrow.3.gif" alt="My experiences with this DLROW" border="0"></a>
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

MVL wrote:Ok. I stated it as a general rule. But by looking at your message I think thats not the case. Can 2 jobs share same RT_CONFIG file? Is this file any way related to hash partitioning?
Thanks
The configuration file all of the rest of us have been talking about is the one whose pathname is specified by APT_CONFIG_FILE environment variable, not the RT_CONFIGnnn hashed file in the Repository (which has nothing whatsoever to do with Hash partitioning except that the specification in your design is stored in RT_CONFIGnnn).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply