Partitioning and repartitioning in a MPP

Ragunathan Gunasekaran · Fri Jul 06, 2007 6:50 am

Hi ,
could any one show light on how to perform Partitioning and repartitioning in a MPP /cluster system. Is there any documentation of the product that deals with the topic. If so could you please name the document so that i can search in it . I have seen in Parallel job developer guide and advanced job developer guide but i dont find any as given for SMP in parallel job developer guide

gbusson · Post by **gbusson** » Fri Jul 06, 2007 7:17 am

hi,

there is no difference between MPP and SMP, regarding ways of partitionning.

The "only" change is the APT_CONFIG_FILE

ray.wurlod · Post by **ray.wurlod** » Fri Jul 06, 2007 9:56 am

There's no difference in how you do it.

There's a significant difference in how it happens. In an SMP ("share everything") environment the repartitioning can take place through shared memory. In an MPP ("share nothing") environment the repartitioning occurs using TCP sockets (at network speeds rather than at memory speeds). Repartitioning is very costly in MPP environments. Avoid it unless it's necessary.

Ragunathan Gunasekaran · Sat Jul 07, 2007 4:46 am

How is this possible with the configuration file . . . coud you please give an view of whats tried to give an explanation.

ray.wurlod · Post by **ray.wurlod** » Sat Jul 07, 2007 9:32 am

In an SMP environment the fastname is the same for every node.
In an MPP environment more than one fastname is used.

dspxguy · Post by **dspxguy** » Tue Nov 27, 2007 1:44 pm

ray.wurlod wrote:In an SMP environment the fastname is the same for every node.
In an MPP environment more than one fastname is used. ...

Ray, how would we know if our environment is MPP or SMP?

ray.wurlod · Post by **ray.wurlod** » Tue Nov 27, 2007 5:18 pm

In an SMP environment the fastname is the same for every node mentioned in the configuration file.
In an MPP environment more than one fastname is used.

If you have only one machine you are necessarily SMP. (DataStage treats NUMA (non-uniform memory architecture) as SMP for its purposes.)

If you have multiple machines on which DataStage processing takes place, then you are MPP. (DataStage uses MPP to encompass any multiple-machine environment, whether cluster, grid or whatever.)

Even with multiple machines, however, you may run a job using a configuration file that only mentions one distinct fastname. In that case, even though you have an MPP environment, your execution is SMP and shared memory will be used by the APT_Communicator class (communication between player processes, for example during re-partitioning) rather than TCP sockets.

DSXchange

Partitioning and repartitioning in a MPP

Partitioning and repartitioning in a MPP

partitioning/repartitioning in MPP systems