Hi
I am working on a project where we get standard template for working. I have a doubt with one of the templates so just wanted to share it with you to remove that.
In job no.1 i am creating a dataset using hash partitioning which is used as a source in job no 2. and in job no. 2 i am using this dataset to create 2 more datasets one for inserts and 2nd for updates. My question is in 2nd job after the source dataset i am using a sort stage and there i have given Partitioning as Auto and after that i have 3 more stages where my partitioning method is "Same". My doubt is that in the sort stage where i have used the partitioning method as "Auto" is it going to re-partition the data which was hash partitioned in the previous job.
Please advise.
Hash Partitioning
Moderators: chulett, rschirm, roy
-
- Premium Member
- Posts: 46
- Joined: Tue Mar 20, 2007 3:30 am
- Location: India
Just make sure that in J#1 you have a flag [Preserve Partitioning]=[Set];
So, in J#2 when one of the stages has an [Auto]-partitioning, it will
definitely preserve the [Hash]-partitioning method used in J#1.
And because you have three more stages with partitioning [Same],
it will carry the [Hash]-partitioning till the end of your jobs execution.
So, in J#2 when one of the stages has an [Auto]-partitioning, it will
definitely preserve the [Hash]-partitioning method used in J#1.
And because you have three more stages with partitioning [Same],
it will carry the [Hash]-partitioning till the end of your jobs execution.
Re: Hash Partitioning
If you have put 'Auto' then it should not, however you can have a check by taking a look at the dump score(search for dump score) and if you find that there is a superfluous partitioning happening you can switch it off using the APT_NO_PART_INSERTION environment variable.
It took me fifteen years to discover I had no talent for ETL, but I couldn't give it up because by that time I was too famous.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
What happens is entirely dependent upon the requirements of the downstream operator. If it's a regular (parallel) operator, with no requirement for partitioned input, Auto should use "Same". If there's no repartitioning icon on the link, you can be comfortable that this is the case. But if you want to be sure, the score is the only place to confirm what is happening.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.