Auto partitioning

dsscholar · Post by **dsscholar** » Wed Sep 14, 2011 7:07 am

Hi all,

I have one job, which will run in auto partition and i dont have any stages which will choose hash partitioning or db2 partitioning.

Is it good to choose auto or "round robin for the first stage and same for the rest of the stages.

Because here by specifying round robin and same am minimizing the time taken by parallel engine in checking the auto and changing the partition to round robin. Is this correct or there is no change in performance

Thanks in advance.

suse_dk · Post by **suse_dk** » Wed Sep 14, 2011 9:41 am

It should not make a performance difference either way

If you are looking to get a performance improvent for your job (besides running on more nodes), then you could check whether you are able to read parallel from your source.

If you want to know whether is it good developing practice to set partitioning manually in a job or rely on auto, then I think there are different opinions.
An instructor of mine (some time ago) recommended it, because as he said, "then you don't depend on IBM keeping the same default partitioning methods"... (however - I don't think they would dare to do that) I would say that it depends on how experienced the developer is, but when the logic depends on a specific partitioning method then I would like the developers I support to set it manually. (Later we might go though the job score and monitor together in order to avoid repartitioning in the job and unnecessary sorts)

dsscholar · Post by **dsscholar** » Wed Sep 14, 2011 11:37 pm

Hi ,

My question is direct. If i set "auto" my job will choose round robin first and then the same partitioning as the stages are like tat. I am not talking about big performance difference. If i set the specific partitioning methods, will it escape the check by parallel engine "to set the partition as round robin at first when the partitioning method is auto ".

Please tell is there any small difference in logic or performance if i set the partitioning methods specifically. Also please confirm if there is no change in that.

Thanks in advance.

SURA · Post by **SURA** » Thu Sep 15, 2011 12:00 am

Once the job is compiled it will come to know the stages, partition methods etc. If you selected any specific method then it will run using that partition method. Otherwise it will use auto. If you need to sort the data or aggregate the data etc, then you need specific partition method and this is based on the need. So your need will decide the partition method and this method will help you get the correct result.

In auto not necessarily Round robin all the time , but mostly.

Basically it is one of the work engine will do irrespective of it is auto / user defined.

DS User

ray.wurlod · Post by **ray.wurlod** » Thu Sep 15, 2011 1:59 am

If you set partitioning unwisely you will get incorrect results and possibly degraded performance as well.

(Auto) is guaranteed to give correct results. It may not be optimal in all cases (for example Entire on reference inputs to Lookup stages, Hash in other cases where key-based partitioning is needed) but it will always work.

dsscholar · Post by **dsscholar** » Thu Sep 15, 2011 5:04 am

Sura,

I understand all these logics. I just have a job with source ---> filter ---> copy ---> transformer ---> target. please understand clearly. I have given auto here in one run for all the stages. Another run, i have given as round - robin at the first partition setting and "same" in the rest. I understand if i set specific partition methods wrongly it will result in incorrect results. But if i know that auto will choose round robin and same means. i can give it straight away. How engine will check it. If its auto then it will check the stage and will choose the correct partitioning? Here some chek is happening or not. Am i eliminating that by specifically setting the required partition or not. This s my question. Am i eliminating that check or it happens in different way. Even it takes nano seconds i want to know how its happening and whether its happening or not..

Thanks in advance.

SURA · Post by **SURA** » Thu Sep 15, 2011 5:52 pm

I know what you mean. I don't see any document (If i am right) for those questions, because it is relates to Datastage Architecture questions how Datastage need to behave!. Ray / Some other real Guru might aware of this.

But my guess (only guess) is, either it is Auto / whatever the partition method, these information will be kept somewhere and the engine will react based on that entry. It may be something like config file technology.

Say for example if it is auto,

It might have an entry Round Robin, Same, Same

Whereas if you choose Hash, same, same

It might have an entry Hash, same, same

When you run the job, this information will be checked by engine and use those option to run a job is my personal assumption.

Let us wait for proper answer.

DS User

dsscholar · Post by **dsscholar** » Sat Sep 17, 2011 7:39 am

Hi all,

Any suggestions regarding this query..

Thanks

dsscholar · Post by **dsscholar** » Mon Sep 19, 2011 8:45 pm

Hi guys,

Please help me regarding this query if u get to know.

Thanks in advance.

chulett · Post by **chulett** » Tue Sep 20, 2011 7:08 am

Do you not have an official support provider you can ask? As a user I'm not in a position to know exactly how things work under the covers. I could guess but I'd rather not. Why not ping support and then post back here whatever you find?

dsscholar · Post by **dsscholar** » Wed Sep 21, 2011 9:09 am

Thanks chulett. I will try to find it out.