Partitioning and repartitioning in a MPP

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Ragunathan Gunasekaran
Participant
Posts: 247
Joined: Mon Jan 22, 2007 11:33 pm

Partitioning and repartitioning in a MPP

Post by Ragunathan Gunasekaran »

Hi ,
could any one show light on how to perform Partitioning and repartitioning in a MPP /cluster system. Is there any documentation of the product that deals with the topic. If so could you please name the document so that i can search in it . I have seen in Parallel job developer guide and advanced job developer guide but i dont find any as given for SMP in parallel job developer guide
Regards
Ragu
gbusson
Participant
Posts: 98
Joined: Fri Oct 07, 2005 2:50 am
Location: France
Contact:

Post by gbusson »

hi,

there is no difference between MPP and SMP, regarding ways of partitionning.

The "only" change is the APT_CONFIG_FILE
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

There's no difference in how you do it.

There's a significant difference in how it happens. In an SMP ("share everything") environment the repartitioning can take place through shared memory. In an MPP ("share nothing") environment the repartitioning occurs using TCP sockets (at network speeds rather than at memory speeds). Repartitioning is very costly in MPP environments. Avoid it unless it's necessary.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Ragunathan Gunasekaran
Participant
Posts: 247
Joined: Mon Jan 22, 2007 11:33 pm

partitioning/repartitioning in MPP systems

Post by Ragunathan Gunasekaran »

How is this possible with the configuration file . . . coud you please give an view of whats tried to give an explanation.
Regards
Ragu
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

In an SMP environment the fastname is the same for every node.
In an MPP environment more than one fastname is used.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
dspxguy
Participant
Posts: 156
Joined: Thu May 24, 2007 4:09 pm
Location: Simi Valley, CA

Post by dspxguy »

ray.wurlod wrote:In an SMP environment the fastname is the same for every node.
In an MPP environment more than one fastname is used. ...
Ray, how would we know if our environment is MPP or SMP?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

In an SMP environment the fastname is the same for every node mentioned in the configuration file.
In an MPP environment more than one fastname is used.

If you have only one machine you are necessarily SMP. (DataStage treats NUMA (non-uniform memory architecture) as SMP for its purposes.)

If you have multiple machines on which DataStage processing takes place, then you are MPP. (DataStage uses MPP to encompass any multiple-machine environment, whether cluster, grid or whatever.)

Even with multiple machines, however, you may run a job using a configuration file that only mentions one distinct fastname. In that case, even though you have an MPP environment, your execution is SMP and shared memory will be used by the APT_Communicator class (communication between player processes, for example during re-partitioning) rather than TCP sockets.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply