Auto partitioning

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
dsscholar
Premium Member
Premium Member
Posts: 195
Joined: Thu Oct 19, 2006 2:45 pm

Auto partitioning

Post by dsscholar »

Hi all,

I have one job, which will run in auto partition and i dont have any stages which will choose hash partitioning or db2 partitioning.

Is it good to choose auto or "round robin for the first stage and same for the rest of the stages.

Because here by specifying round robin and same am minimizing the time taken by parallel engine in checking the auto and changing the partition to round robin. Is this correct or there is no change in performance

Thanks in advance.
suse_dk
Participant
Posts: 93
Joined: Thu Aug 11, 2011 6:18 am
Location: Denmark

Post by suse_dk »

It should not make a performance difference either way

If you are looking to get a performance improvent for your job (besides running on more nodes), then you could check whether you are able to read parallel from your source.

If you want to know whether is it good developing practice to set partitioning manually in a job or rely on auto, then I think there are different opinions.
An instructor of mine (some time ago) recommended it, because as he said, "then you don't depend on IBM keeping the same default partitioning methods"... (however - I don't think they would dare to do that) I would say that it depends on how experienced the developer is, but when the logic depends on a specific partitioning method then I would like the developers I support to set it manually. (Later we might go though the job score and monitor together in order to avoid repartitioning in the job and unnecessary sorts)
_________________
- Susanne
dsscholar
Premium Member
Premium Member
Posts: 195
Joined: Thu Oct 19, 2006 2:45 pm

Post by dsscholar »

Hi ,

My question is direct. If i set "auto" my job will choose round robin first and then the same partitioning as the stages are like tat. I am not talking about big performance difference. If i set the specific partitioning methods, will it escape the check by parallel engine "to set the partition as round robin at first when the partitioning method is auto ".

Please tell is there any small difference in logic or performance if i set the partitioning methods specifically. Also please confirm if there is no change in that.

Thanks in advance.
Last edited by dsscholar on Thu Sep 15, 2011 4:53 am, edited 1 time in total.
SURA
Premium Member
Premium Member
Posts: 1229
Joined: Sat Jul 14, 2007 5:16 am
Location: Sydney

Post by SURA »

Once the job is compiled it will come to know the stages, partition methods etc. If you selected any specific method then it will run using that partition method. Otherwise it will use auto. If you need to sort the data or aggregate the data etc, then you need specific partition method and this is based on the need. So your need will decide the partition method and this method will help you get the correct result.

In auto not necessarily Round robin all the time , but mostly.

Basically it is one of the work engine will do irrespective of it is auto / user defined.

DS User
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

If you set partitioning unwisely you will get incorrect results and possibly degraded performance as well.

(Auto) is guaranteed to give correct results. It may not be optimal in all cases (for example Entire on reference inputs to Lookup stages, Hash in other cases where key-based partitioning is needed) but it will always work.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
dsscholar
Premium Member
Premium Member
Posts: 195
Joined: Thu Oct 19, 2006 2:45 pm

Post by dsscholar »

Sura,

I understand all these logics. I just have a job with source ---> filter ---> copy ---> transformer ---> target. please understand clearly. I have given auto here in one run for all the stages. Another run, i have given as round - robin at the first partition setting and "same" in the rest. I understand if i set specific partition methods wrongly it will result in incorrect results. But if i know that auto will choose round robin and same means. i can give it straight away. How engine will check it. If its auto then it will check the stage and will choose the correct partitioning? Here some chek is happening or not. Am i eliminating that by specifically setting the required partition or not. This s my question. Am i eliminating that check or it happens in different way. Even it takes nano seconds i want to know how its happening and whether its happening or not..


Thanks in advance.
SURA
Premium Member
Premium Member
Posts: 1229
Joined: Sat Jul 14, 2007 5:16 am
Location: Sydney

Post by SURA »

I know what you mean. I don't see any document (If i am right) for those questions, because it is relates to Datastage Architecture questions how Datastage need to behave!. Ray / Some other real Guru might aware of this.

But my guess (only guess) is, either it is Auto / whatever the partition method, these information will be kept somewhere and the engine will react based on that entry. It may be something like config file technology.

Say for example if it is auto,

It might have an entry Round Robin, Same, Same

Whereas if you choose Hash, same, same

It might have an entry Hash, same, same

When you run the job, this information will be checked by engine and use those option to run a job is my personal assumption.

Let us wait for proper answer.

DS User
dsscholar
Premium Member
Premium Member
Posts: 195
Joined: Thu Oct 19, 2006 2:45 pm

Post by dsscholar »

Hi all,

Any suggestions regarding this query..

Thanks
dsscholar
Premium Member
Premium Member
Posts: 195
Joined: Thu Oct 19, 2006 2:45 pm

Post by dsscholar »

Hi guys,

Please help me regarding this query if u get to know.

Thanks in advance.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Do you not have an official support provider you can ask? As a user I'm not in a position to know exactly how things work under the covers. I could guess but I'd rather not. Why not ping support and then post back here whatever you find?
-craig

"You can never have too many knives" -- Logan Nine Fingers
dsscholar
Premium Member
Premium Member
Posts: 195
Joined: Thu Oct 19, 2006 2:45 pm

Post by dsscholar »

Thanks chulett. I will try to find it out.
Post Reply