Working of the Partitioning

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Gokul
Participant
Posts: 74
Joined: Wed Feb 23, 2005 10:58 pm
Location: Mumbai

Working of the Partitioning

Post by Gokul »

Hi,

Just want to understand the partitining concepts in PX.

Ex. I have configuration files with 2 nodes . The source file is partitioned on custno.
e,g custno are 1,2,3,4.

So , datawise there are 4 partitions but only 2 logical nodes.

1> How will the data be allocated to the 2 logical nodes.
2> Also, if have a stage variable.As i understand 2 copies will be created.
or number of stage variable will be equal to number of data partitions.

Thanks,
Gokul
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Please do not confuse processing nodes with partitioning. There are several different ways data may be partitioned over the available number of nodes. The number of nodes is specified (via the number of node names) in the configuration file currently being used.

Partitioning is the process of spreading rows amongst the processing nodes. How rows are partitioned will depend on the partitioning algorithm chosen. This is well described in the IBM class DX444 (DataStage Essentials).

No matter which partitioning algorithm is chosen, if you have two nodes zero or more rows will be processed by the first node and the remaining rows will be processed by the other node, in completely separate processes (though they may communicate with each other should the job design call for repartitioning). Which rows go to which node is determined by the partitioning algorithm, which is specified on the input link of any stage.

It's not just stage variables that will be multiplied - every operator in the design will be cloned N times, where N is the number of nodes.

When you are performing certain tasks, such as combining, comparing, grouping or sorting, then it is important that key-adjacency be achieved via thoughtful selection of partitioning algorithm.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
rameshv
Participant
Posts: 11
Joined: Wed Feb 27, 2008 11:14 pm

Re: Working of the Partitioning

Post by rameshv »

In datastage Partioning and nodes is different concepts.
The job can be partioned into seven types ,Nodes is reduce the time given for nodes
Post Reply