Auto partitioning
Moderators: chulett, rschirm, roy
Auto partitioning
Hi all,
I have one job, which will run in auto partition and i dont have any stages which will choose hash partitioning or db2 partitioning.
Is it good to choose auto or "round robin for the first stage and same for the rest of the stages.
Because here by specifying round robin and same am minimizing the time taken by parallel engine in checking the auto and changing the partition to round robin. Is this correct or there is no change in performance
Thanks in advance.
I have one job, which will run in auto partition and i dont have any stages which will choose hash partitioning or db2 partitioning.
Is it good to choose auto or "round robin for the first stage and same for the rest of the stages.
Because here by specifying round robin and same am minimizing the time taken by parallel engine in checking the auto and changing the partition to round robin. Is this correct or there is no change in performance
Thanks in advance.
It should not make a performance difference either way
If you are looking to get a performance improvent for your job (besides running on more nodes), then you could check whether you are able to read parallel from your source.
If you want to know whether is it good developing practice to set partitioning manually in a job or rely on auto, then I think there are different opinions.
An instructor of mine (some time ago) recommended it, because as he said, "then you don't depend on IBM keeping the same default partitioning methods"... (however - I don't think they would dare to do that) I would say that it depends on how experienced the developer is, but when the logic depends on a specific partitioning method then I would like the developers I support to set it manually. (Later we might go though the job score and monitor together in order to avoid repartitioning in the job and unnecessary sorts)
If you are looking to get a performance improvent for your job (besides running on more nodes), then you could check whether you are able to read parallel from your source.
If you want to know whether is it good developing practice to set partitioning manually in a job or rely on auto, then I think there are different opinions.
An instructor of mine (some time ago) recommended it, because as he said, "then you don't depend on IBM keeping the same default partitioning methods"... (however - I don't think they would dare to do that) I would say that it depends on how experienced the developer is, but when the logic depends on a specific partitioning method then I would like the developers I support to set it manually. (Later we might go though the job score and monitor together in order to avoid repartitioning in the job and unnecessary sorts)
_________________
- Susanne
- Susanne
Hi ,
My question is direct. If i set "auto" my job will choose round robin first and then the same partitioning as the stages are like tat. I am not talking about big performance difference. If i set the specific partitioning methods, will it escape the check by parallel engine "to set the partition as round robin at first when the partitioning method is auto ".
Please tell is there any small difference in logic or performance if i set the partitioning methods specifically. Also please confirm if there is no change in that.
Thanks in advance.
My question is direct. If i set "auto" my job will choose round robin first and then the same partitioning as the stages are like tat. I am not talking about big performance difference. If i set the specific partitioning methods, will it escape the check by parallel engine "to set the partition as round robin at first when the partitioning method is auto ".
Please tell is there any small difference in logic or performance if i set the partitioning methods specifically. Also please confirm if there is no change in that.
Thanks in advance.
Last edited by dsscholar on Thu Sep 15, 2011 4:53 am, edited 1 time in total.
Once the job is compiled it will come to know the stages, partition methods etc. If you selected any specific method then it will run using that partition method. Otherwise it will use auto. If you need to sort the data or aggregate the data etc, then you need specific partition method and this is based on the need. So your need will decide the partition method and this method will help you get the correct result.
In auto not necessarily Round robin all the time , but mostly.
Basically it is one of the work engine will do irrespective of it is auto / user defined.
DS User
In auto not necessarily Round robin all the time , but mostly.
Basically it is one of the work engine will do irrespective of it is auto / user defined.
DS User
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
If you set partitioning unwisely you will get incorrect results and possibly degraded performance as well.
(Auto) is guaranteed to give correct results. It may not be optimal in all cases (for example Entire on reference inputs to Lookup stages, Hash in other cases where key-based partitioning is needed) but it will always work.
(Auto) is guaranteed to give correct results. It may not be optimal in all cases (for example Entire on reference inputs to Lookup stages, Hash in other cases where key-based partitioning is needed) but it will always work.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Sura,
I understand all these logics. I just have a job with source ---> filter ---> copy ---> transformer ---> target. please understand clearly. I have given auto here in one run for all the stages. Another run, i have given as round - robin at the first partition setting and "same" in the rest. I understand if i set specific partition methods wrongly it will result in incorrect results. But if i know that auto will choose round robin and same means. i can give it straight away. How engine will check it. If its auto then it will check the stage and will choose the correct partitioning? Here some chek is happening or not. Am i eliminating that by specifically setting the required partition or not. This s my question. Am i eliminating that check or it happens in different way. Even it takes nano seconds i want to know how its happening and whether its happening or not..
Thanks in advance.
I understand all these logics. I just have a job with source ---> filter ---> copy ---> transformer ---> target. please understand clearly. I have given auto here in one run for all the stages. Another run, i have given as round - robin at the first partition setting and "same" in the rest. I understand if i set specific partition methods wrongly it will result in incorrect results. But if i know that auto will choose round robin and same means. i can give it straight away. How engine will check it. If its auto then it will check the stage and will choose the correct partitioning? Here some chek is happening or not. Am i eliminating that by specifically setting the required partition or not. This s my question. Am i eliminating that check or it happens in different way. Even it takes nano seconds i want to know how its happening and whether its happening or not..
Thanks in advance.
I know what you mean. I don't see any document (If i am right) for those questions, because it is relates to Datastage Architecture questions how Datastage need to behave!. Ray / Some other real Guru might aware of this.
But my guess (only guess) is, either it is Auto / whatever the partition method, these information will be kept somewhere and the engine will react based on that entry. It may be something like config file technology.
Say for example if it is auto,
It might have an entry Round Robin, Same, Same
Whereas if you choose Hash, same, same
It might have an entry Hash, same, same
When you run the job, this information will be checked by engine and use those option to run a job is my personal assumption.
Let us wait for proper answer.
DS User
But my guess (only guess) is, either it is Auto / whatever the partition method, these information will be kept somewhere and the engine will react based on that entry. It may be something like config file technology.
Say for example if it is auto,
It might have an entry Round Robin, Same, Same
Whereas if you choose Hash, same, same
It might have an entry Hash, same, same
When you run the job, this information will be checked by engine and use those option to run a job is my personal assumption.
Let us wait for proper answer.
DS User
Do you not have an official support provider you can ask? As a user I'm not in a position to know exactly how things work under the covers. I could guess but I'd rather not. Why not ping support and then post back here whatever you find?
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers