PX configuration

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
pxlearn
Participant
Posts: 11
Joined: Sat May 07, 2005 4:07 am
Location: Chicago,IL,USA

PX configuration

Post by pxlearn »

Hi,


Anybody please advise me to calculate the optimal no nodes to be configured based on the no of server and cpus .For ex cluseter of 2 servers and 16 CPU's on each server extracting the millions of records data from files/database loading the data into a db2 partioned database.

what would be the suggested configuration 2/4/8 Nodes ?

Does ds nodes should be same as db2 partitions/nodes ,if not same does it impacts during loading?
Thanks ,
Pxlearn
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

There are so many factors that can affect the optimal configurtion that there really is no easy answer.

The best approach in my opinion is to develop with a 2 or more node configuration. Once the job is ready to be tested, do timings with 1 (yes, many times a single node configuration will be the most efficient and fastest) 2, and more nodes until you see performance drop off or no longer increase.

You will be surprised how often a 1 or 2 node configuration will be the fastest even on very large machines.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Have at least one configuration containing N processing nodes for each N, where N is the number of partitions in a partitioned DB2 table. By using the DB2 partitioning algorithm in these cases you can achieve true parallel loading without the need to repartition data.

Other than that, the main criterion for number of processing nodes is driven primarily by resource limitations; if you have so many nodes that you swamp the machine(s) with demands for resources, then some of your jobs/operators may be abnormally terminated, or even fail to start, because of lack of resources.

It also depends on what else you're trying to do. For example, running eight two-node jobs at the same time is equivalent (all else being equal) to running one 16-node job.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply