Hash Partitioning

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
bi_fujitsu
Premium Member
Premium Member
Posts: 46
Joined: Tue Mar 20, 2007 3:30 am
Location: India

Hash Partitioning

Post by bi_fujitsu »

Hi

I am working on a project where we get standard template for working. I have a doubt with one of the templates so just wanted to share it with you to remove that.

In job no.1 i am creating a dataset using hash partitioning which is used as a source in job no 2. and in job no. 2 i am using this dataset to create 2 more datasets one for inserts and 2nd for updates. My question is in 2nd job after the source dataset i am using a sort stage and there i have given Partitioning as Auto and after that i have 3 more stages where my partitioning method is "Same". My doubt is that in the sort stage where i have used the partitioning method as "Auto" is it going to re-partition the data which was hash partitioned in the previous job.


Please advise.
Mike3000
Participant
Posts: 24
Joined: Mon Mar 26, 2007 9:16 am

Post by Mike3000 »

Just make sure that in J#1 you have a flag [Preserve Partitioning]=[Set];
So, in J#2 when one of the stages has an [Auto]-partitioning, it will
definitely preserve the [Hash]-partitioning method used in J#1.
And because you have three more stages with partitioning [Same],
it will carry the [Hash]-partitioning till the end of your jobs execution.
sud
Premium Member
Premium Member
Posts: 366
Joined: Fri Dec 02, 2005 5:00 am
Location: Here I Am

Re: Hash Partitioning

Post by sud »

If you have put 'Auto' then it should not, however you can have a check by taking a look at the dump score(search for dump score) and if you find that there is a superfluous partitioning happening you can switch it off using the APT_NO_PART_INSERTION environment variable.
It took me fifteen years to discover I had no talent for ETL, but I couldn't give it up because by that time I was too famous.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

What happens is entirely dependent upon the requirements of the downstream operator. If it's a regular (parallel) operator, with no requirement for partitioned input, Auto should use "Same". If there's no repartitioning icon on the link, you can be comfortable that this is the case. But if you want to be sure, the score is the only place to confirm what is happening.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply