CPU CYCLES

just4u_sharath · Post by **just4u_sharath** » Mon Apr 14, 2008 5:58 pm

How many loical nodes we have, or how many CPUS are brached to each logical node, the job always consumes the same numer of CPU CYCLES.
IS the above statement correct? Please let me know

ray.wurlod · Post by **ray.wurlod** » Mon Apr 14, 2008 6:17 pm

The answer is a definite maybe. What does "brached" mean?

It depends also on many factors, not all of them internal to DataStage, such as whether CPU cycles have to be expended on paging activities as each process exhausts its timeslice and gets another one. So one of the factors is how the operating system CPU scheduler is configured - whether it favours foreground or background processing. There are myriad other potential factors.

just4u_sharath · Post by **just4u_sharath** » Mon Apr 14, 2008 6:59 pm

ray.wurlod wrote:The answer is a definite maybe. What does "brached" mean?

It depends also on many factors, not all of them internal to DataStage, such as whether CPU cycles have to be expended on paging activities as each process exhausts its timeslice and gets another one. So one of the factors is how the operating system CPU scheduler is configured - whether it favours foreground or background processing. There are myriad other potential factors.

How can we find out the number of CPUS assigned to each logical node.
I heard ascential charges companies based on number of CPU cycles the company use. If this is the case, for every job companies can use as many as logical nodes so that the job runs very fast at the same cost. Am i right?
Also Each logical node is assigned to 2 or more CPUS. Ascential recommends 2 CPUS. how can we find out this information. Is there any unix command to know how many CPUS are there under specific Node.

ray.wurlod · Post by **ray.wurlod** » Mon Apr 14, 2008 7:08 pm

Licensing for DataStage is based on the number of CPUs, not on the number of CPU cycles.

Cycle (or MIPS) licensing is really only ever encountered with mainframe systems. That said, I am not aware of the DataStage licensing model on USS - check with your support provider.

There is no relationship whatsoever between logical nodes and CPUs.

Your job will create a score at runtime that includes a number of operators. Each operator, plus a controlling process called a "section leader", will execute as separate processes. It is how those processes are allocated to CPUs by the operating system CPU scheduler - and which can vary over time - that determines the total number of CPUs dedicated to DataStage processing at any given instant.

Therefore your question about reporting the number of CPUs per node is moot.

John Smith · Post by **John Smith** » Mon Apr 14, 2008 11:56 pm

mmm...where did you get your info? ascential does not exists anymore it's IBM.

just4u_sharath · Post by **just4u_sharath** » Tue Apr 15, 2008 3:24 pm

ray.wurlod wrote:Licensing for DataStage is based on the number of CPUs, not on the number of CPU cycles.

Cycle (or MIPS) licensing is really only ever encountered with mainframe systems. That said, I am not aware of the DataStage licensing model on USS - check with your support provider.

There is no relationship whatsoever between logical nodes and CPUs.

Your job will create a score at runtime that includes a number of operators. Each operator, plus a controlling process called a "section leader", will execute as separate processes. It is how those processes are allocated to CPUs by the operating system CPU scheduler - and which can vary over time - that determines the total number of CPUs dedicated to DataStage processing at any given instant.

Therefore your question about reporting the number of CPUs per node is moot.

So you say there is no relation ship between nodes and CPUS. So if i have 20 CPUS and there are 40 opertors, then all these 40 operators are executed by 20 CPUS. IF this is case, the speed of job depends on NUmber of CPUS, not on logical nodes. So nodes has significance only at partitioning. It has no importance when we talk about speed of job. I also heard the each node has soem fixed number of CPUS. Some nodes may have more CPUS and some less. IS this statement true?

ray.wurlod · Post by **ray.wurlod** » Tue Apr 15, 2008 3:40 pm

You "heard" did you? Where?

Read what I wrote again (in both earlier posts on this thread). The answer to your most recent question is in there, if you give it some thought.

Post by **daignault** » Tue Apr 15, 2008 3:51 pm

just4u_sharath wrote:
ray.wurlod wrote:Licensing for DataStage is based on the number of CPUs, not on the number of CPU cycles.

Cycle (or MIPS) licensing is really only ever encountered with mainframe systems. That said, I am not aware of the DataStage licensing model on USS - check with your support provider.

There is no relationship whatsoever between logical nodes and CPUs.

Your job will create a score at runtime that includes a number of operators. Each operator, plus a controlling process called a "section leader", will execute as separate processes. It is how those processes are allocated to CPUs by the operating system CPU scheduler - and which can vary over time - that determines the total number of CPUs dedicated to DataStage processing at any given instant.

Therefore your question about reporting the number of CPUs per node is moot.
So you say there is no relation ship between nodes and CPUS. So if i have 20 CPUS and there are 40 opertors, then all these 40 operators are executed by 20 CPUS. IF this is case, the speed of job depends on NUmber of CPUS, not on logical nodes. So nodes has significance only at partitioning. It has no importance when we talk about speed of job. I also heard the each node has soem fixed number of CPUS. Some nodes may have more CPUS and some less. IS this statement true?

This is not quite true. There is no binding of processes to a CPU (affinity) within the EE engine. However, if you have a 4 CPU system and set your APT for 20 nodes, you will have a large number of processes in the queue in a ready state for CPU allocation. However most of the processes will be blocked due to I/O.

Also, in the same 4CPU/20 node environment, if your job connects to an Oracle database at both ends (1 input and 1 output) then you will be establishing at least 40 connections to your Oracle instance. This could lead to an unhappy DBA of the Oracle instance.

In the reverse instance, if you have a 20CPU system and you only allocate 2 nodes in the APT, there will be a limited number of processes in the ready queue awaiting processing. And since there is only 2 streams of input data, you will most likely throttle the job by lack of data to process.

The usual rule of thumb is 2xCPU in your APT file.

Regards,

Ray Daignault

ray.wurlod · Post by **ray.wurlod** » Tue Apr 15, 2008 4:50 pm

The rule of the other thumb is to have some smaller configuration files for running multiple small jobs at the same time.

kumar_s · Post by **kumar_s** » Tue Apr 15, 2008 5:24 pm

As said, the process varies based on the Number of nodes. And each node will be allocated with list of process of each operator. And the CPU pool in each node will share the process allocated to it.

ray.wurlod · Post by **ray.wurlod** » Tue Apr 15, 2008 6:42 pm

There's no such thing as a "CPU pool in each node".

CPUs are totally independent of nodes. Processes are started from nodes; the operating system allocates processes to CPUs on a time-share basis. Even one process may visit several CPUs during the course of a long-running execution (but only one at any one time). DataStage does not attempt to control this; that's one of the main functions of a multi-user operating system.

just4u_sharath · Post by **just4u_sharath** » Tue Apr 15, 2008 7:42 pm

ray.wurlod wrote:There's no such thing as a "CPU pool in each node".

CPUs are totally independent of nodes. Processes are started from nodes; the operating system allocates processes to CPUs on a time-share basis. Even one process may visit several CPUs during the course of a long-running execution (but only one at any one time). DataStage does not attempt to control this; that's one of the main functions of a multi-user operating system.

If CPUS are totally independent of Nodes, then how can i restrict my job to run on specific CPUS. I am assuming that specific CPUS will be assigned to each nodes and by constraining the job to run on specific nodes, the job is constrained to specific CPUS. If there is no relation between CPUS and nodes, and every small job if all the CPUS are executed, that will be a problem.

John Smith · Post by **John Smith** » Tue Apr 15, 2008 7:49 pm

Simple answer is you can't. It's up to the OS.

ray.wurlod · Post by **ray.wurlod** » Tue Apr 15, 2008 7:52 pm

The only way you can restrict nodes to execute on specific CPUs is where those CPUs are in separate machines, and you are not executing in a grid environment.

In that case you can create node pools containing just the node(s) on the machine(s) - and therefore CPU(s) - in question.

There is no other way to restrict nodes to CPUs.
Your assumption is totally incorrect.

Post by **daignault** » Tue Apr 15, 2008 8:16 pm

If affinity of job to CPU is what you are trying to accomplish, IBM has the solution for you :)

If you use LPARs on an on a Power5 AIX system, you can designate CPU affinity and limit the CPU/Memory resources assigned to a Datastage EE job executing in that LPAR.

Never say it Can't be done. Just throw money and it can be done :)

Ray D

DSXchange

CPU CYCLES

CPU CYCLES

CPU CYCLES

CPU CYCLES

Re: CPU CYCLES