Page 1 of 1

Working of the Partitioning

Posted: Mon Feb 22, 2010 10:57 pm
by Gokul
Hi,

Just want to understand the partitining concepts in PX.

Ex. I have configuration files with 2 nodes . The source file is partitioned on custno.
e,g custno are 1,2,3,4.

So , datawise there are 4 partitions but only 2 logical nodes.

1> How will the data be allocated to the 2 logical nodes.
2> Also, if have a stage variable.As i understand 2 copies will be created.
or number of stage variable will be equal to number of data partitions.

Thanks,
Gokul

Posted: Mon Feb 22, 2010 11:05 pm
by ray.wurlod
Please do not confuse processing nodes with partitioning. There are several different ways data may be partitioned over the available number of nodes. The number of nodes is specified (via the number of node names) in the configuration file currently being used.

Partitioning is the process of spreading rows amongst the processing nodes. How rows are partitioned will depend on the partitioning algorithm chosen. This is well described in the IBM class DX444 (DataStage Essentials).

No matter which partitioning algorithm is chosen, if you have two nodes zero or more rows will be processed by the first node and the remaining rows will be processed by the other node, in completely separate processes (though they may communicate with each other should the job design call for repartitioning). Which rows go to which node is determined by the partitioning algorithm, which is specified on the input link of any stage.

It's not just stage variables that will be multiplied - every operator in the design will be cloned N times, where N is the number of nodes.

When you are performing certain tasks, such as combining, comparing, grouping or sorting, then it is important that key-adjacency be achieved via thoughtful selection of partitioning algorithm.

Re: Working of the Partitioning

Posted: Mon Feb 22, 2010 11:32 pm
by rameshv
In datastage Partioning and nodes is different concepts.
The job can be partioned into seven types ,Nodes is reduce the time given for nodes