Number of processors used

JeroenDmt · Post by **JeroenDmt** » Wed Jul 04, 2007 5:02 am

We are having a new server with supposedly 2 processors.
I was curious to see the performance difference between running parallel jobs using 1 or 2 processors, so I made a few testjobs and ran them using a 1 /2 /4 node configuration file.
I know the nodes don't necessarily correspond to different processors being used, but I was expecting to see some differences at least. However the execution time for all the jobs seem to be the same for all three configuration files.

The testjobs I used are for example:
- job 1 reads a database (sequentially) and writes the result to a dataset using 1, 2 or 4 nodes.
- job 2 reads this dataset and write the result to another dataset.
I'm not sure about the first job, but I had expected the second job to run faster with a 2 or 4 node configuration file than using 1 node.

I checked the datasets and they are using the correct number of nodes and the data is distributed evenly over the nodes.

Does this mean datastage might not use both processors? Is there any way to see if one or two processors are actually being used?
Or are my testjobs wrong to see the difference? Or am I thinking in the wrong direction?

JeroenDmt · Post by **JeroenDmt** » Wed Jul 04, 2007 5:45 am

Apparently I do have to read the books again.
According to the unix operator my first job hardly used any CPU at all (probably it was too busy reading and writing to even need 2 CPU's).

Another test jobs that required more processing (sequential file -> column importer -> sort stage -> dataset) appeared to use both CPU's completely, so at least that's going okay.

However the execution time for using a 1, 2 or 4 node configuration file are still about the same.

If someone can explain this for me, please do. I would have expected some differences using different configuration files.

priyadarshikunal · Post by **priyadarshikunal** » Wed Jul 04, 2007 7:05 am

JeroenDmt wrote: However the execution time for using a 1, 2 or 4 node configuration file are still about the same.

can you post the cofigurations like partitioning schemes used in your job
there are more possibilities but first of all check this

JeroenDmt · Post by **JeroenDmt** » Wed Jul 04, 2007 7:16 am

priyadarshikunal wrote: can you post the cofigurations like partitioning schemes used in your job
there are more possibilities but first of all check this

The partitioning is all set to default since it's just a test. I don't think that should really make a difference here. I checked the distribution of the data over the nodes and the distribution is spread out evenly.

The configuration file used is

Code: Select all

main_program: APT configuration file: /datastage/Ascential/DataStage/Configurations/FourNodes.apt
{
	node "node1"
	{
		fastname "phls6037"
		pools ""
		resource disk "/datastage/Ascential/DataStage/Datasets" {pools ""}
		resource scratchdisk "/datastage/Ascential/DataStage/Scratch" {pools ""}
	}
	node "node2"
	{
		fastname "phls6037"
		pools ""
		resource disk "/datastage/Ascential/DataStage/Datasets" {pools ""}
		resource scratchdisk "/datastage/Ascential/DataStage/Scratch" {pools ""}
	}
	node "node3"
	{
		fastname "phls6037"
		pools ""
		resource disk "/datastage/Ascential/DataStage/Datasets" {pools ""}
		resource scratchdisk "/datastage/Ascential/DataStage/Scratch" {pools ""}
	}
	node "node4"
	{
		fastname "phls6037"
		pools ""
		resource disk "/datastage/Ascential/DataStage/Datasets" {pools ""}
		resource scratchdisk "/datastage/Ascential/DataStage/Scratch" {pools ""}
	}
}

(this is the one for 4 nodes, for 1 or 2 nodes, it's simply the first 1 or 2 blocks)

ray.wurlod · Post by **ray.wurlod** » Wed Jul 04, 2007 1:28 pm

PX moves data in chunks of at least 32KB. If you have less data than that - or even less than 128KB in my experience, you're not really going to get good parallelism. (For fixed width data the transport block size is even larger; 1MB by default.) Try your tests with large data volumes - at least 10MB.

kcbland · Post by **kcbland** » Wed Jul 04, 2007 2:58 pm

I don't think you will be able to appreciate the parallel capabilities on a 2 cpu server. You probably won't get much performance variation using 1, 2, or 4 node pools that couldn't be attributed to process task switching overhead. You have OS overhead which will be more apparent on a 2 cpu server than on a 4 cpu server. Also, you won't be able to cleanly control the resources via node pool configuration on 2 cpus as you would 16 cpus.

I don't think the tests you would perform could have the conclusions applied to larger environments. For example, if you see that a 2 cpu 2 node pool environment operates better than a 2 cpu 4 node pool environment, you can't project and say a 16 cpu 16 node pool environment is better than a 16 cpu 32 node pool environment.

JeroenDmt · Post by **JeroenDmt** » Fri Jul 06, 2007 12:08 am

kcbland wrote:I don't think you will be able to appreciate the parallel capabilities on a 2 cpu server. You probably won't get much performance variation using 1, 2, or 4 node pools that couldn't be attributed to process task switching overhead.

What configuration would be "good enough" to see those differences? Would 4 cpus be enough or does it really start counting when the number of processors goes up more?

And Ray: the testdata was about 2.5 million records, 120 MB data or something like that.

ArndW · Post by **ArndW** » Fri Jul 06, 2007 2:26 am

Often a 1-node configuration will run better than a 2 or 4 node configuration. It is hard to know which level of parallelism will result in best performance in a production environment.

My approach is:
a) Always design with a 2 (or more) node configuration. This ensures that your lookup and other partitioning and re-partitioning is coded correctly. A job designed on a 1-node configuration might not run correctly on more nodes, but a job designed on multiple nodes will always run in a 1-node configuration.

b) run timings tests in dev/sit/prod [wherever you have a representative environment] with 1,2,3,4,8 and so on nodes to see if you see differences and use the lowest amount of nodes that gives you acceptable performance. If a 8 node config is 10% faster than a 4 node config I would use the 4 nodes, since you have doubled the amount of processes and system overhead for only a nominal speed increase.

I think the Pareto rule applies here - 80% of all jobs will run best on 1 node and 20% will run better with more parallelism, and those jobs will porbably need 80% of your system resources.

kcbland · Post by **kcbland** » Fri Jul 06, 2007 7:56 am

It's important to separate discussion on node cofiguration by hardware configuration. In a clustered environment, you can really see the difference in node configurations because you an exclude include different servers. The variance in performance is apparent because cpus are physically being excluded.

When you're on a single server node then all cpus on that node are available for use by PX processes, irrespective of the node configuration. Node deals with partitioning of data and parallel tasks doing the same work on different subsets of the data. When you have 1 cpu, it's doing all of the work for all of the nodes. When you have 2 cpus, they're each doing half of all of the work for all of the nodes, but now the OS has twice the number of processes to manage and thus consumes more cpu cycles managing the processes. In addition, the conductor processes have more work to do. If you have 4 nodes, you're adding 4X the burden to managing processes and memory/disk resources.

The way to see when you have excess computing power is to have more cpus than you have logical nodes. If you have a 1 node configuration, each active stage is going to be a single process. So 3 active stages is 3 processes, on a 2 cpu box you will probably be overloaded as long as your same job is not dealing with a database and just uses local files. Adding more logical nodes just overloads the machine even more. But if you have 8 cpus and a 1 node configuration, running a job with 3 active stages, you will not overload the cpus. But if you use a 2 node configuration you will generate 6 (simplified) active processes. Using a 4 node configuration generates 12 processes which on 8 cpus puts you over the top.

Job designs matter! The number of active stages and the node configuration change the number of processes sitting on your cpus. Each has to be carefully considered when designing jobs.

DSXchange

Number of processors used

Number of processors used

Re: Number of processors used

Re: Number of processors used

Re: Number of processors used