CPU CYCLES

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

just4u_sharath
Premium Member
Premium Member
Posts: 236
Joined: Sun Apr 01, 2007 7:41 am
Location: Michigan

CPU CYCLES

Post by just4u_sharath »

How many loical nodes we have, or how many CPUS are brached to each logical node, the job always consumes the same numer of CPU CYCLES.
IS the above statement correct? Please let me know
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The answer is a definite maybe. What does "brached" mean?

It depends also on many factors, not all of them internal to DataStage, such as whether CPU cycles have to be expended on paging activities as each process exhausts its timeslice and gets another one. So one of the factors is how the operating system CPU scheduler is configured - whether it favours foreground or background processing. There are myriad other potential factors.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
just4u_sharath
Premium Member
Premium Member
Posts: 236
Joined: Sun Apr 01, 2007 7:41 am
Location: Michigan

CPU CYCLES

Post by just4u_sharath »

ray.wurlod wrote:The answer is a definite maybe. What does "brached" mean?

It depends also on many factors, not all of them internal to DataStage, such as whether CPU cycles have to be expended on paging activities as each process exhausts its timeslice and gets another one. So one of the factors is how the operating system CPU scheduler is configured - whether it favours foreground or background processing. There are myriad other potential factors.
How can we find out the number of CPUS assigned to each logical node.
I heard ascential charges companies based on number of CPU cycles the company use. If this is the case, for every job companies can use as many as logical nodes so that the job runs very fast at the same cost. Am i right?
Also Each logical node is assigned to 2 or more CPUS. Ascential recommends 2 CPUS. how can we find out this information. Is there any unix command to know how many CPUS are there under specific Node.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Licensing for DataStage is based on the number of CPUs, not on the number of CPU cycles.

Cycle (or MIPS) licensing is really only ever encountered with mainframe systems. That said, I am not aware of the DataStage licensing model on USS - check with your support provider.

There is no relationship whatsoever between logical nodes and CPUs.

Your job will create a score at runtime that includes a number of operators. Each operator, plus a controlling process called a "section leader", will execute as separate processes. It is how those processes are allocated to CPUs by the operating system CPU scheduler - and which can vary over time - that determines the total number of CPUs dedicated to DataStage processing at any given instant.

Therefore your question about reporting the number of CPUs per node is moot.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
John Smith
Charter Member
Charter Member
Posts: 193
Joined: Tue Sep 05, 2006 8:01 pm
Location: Australia

Post by John Smith »

mmm...where did you get your info? ascential does not exists anymore it's IBM. :lol:
just4u_sharath
Premium Member
Premium Member
Posts: 236
Joined: Sun Apr 01, 2007 7:41 am
Location: Michigan

CPU CYCLES

Post by just4u_sharath »

ray.wurlod wrote:Licensing for DataStage is based on the number of CPUs, not on the number of CPU cycles.

Cycle (or MIPS) licensing is really only ever encountered with mainframe systems. That said, I am not aware of the DataStage licensing model on USS - check with your support provider.

There is no relationship whatsoever between logical nodes and CPUs.

Your job will create a score at runtime that includes a number of operators. Each operator, plus a controlling process called a "section leader", will execute as separate processes. It is how those processes are allocated to CPUs by the operating system CPU scheduler - and which can vary over time - that determines the total number of CPUs dedicated to DataStage processing at any given instant.

Therefore your question about reporting the number of CPUs per node is moot.
So you say there is no relation ship between nodes and CPUS. So if i have 20 CPUS and there are 40 opertors, then all these 40 operators are executed by 20 CPUS. IF this is case, the speed of job depends on NUmber of CPUS, not on logical nodes. So nodes has significance only at partitioning. It has no importance when we talk about speed of job. I also heard the each node has soem fixed number of CPUS. Some nodes may have more CPUS and some less. IS this statement true?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You "heard" did you? Where?

Read what I wrote again (in both earlier posts on this thread). The answer to your most recent question is in there, if you give it some thought.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
daignault
Premium Member
Premium Member
Posts: 165
Joined: Tue Mar 30, 2004 2:44 pm
Contact:

Re: CPU CYCLES

Post by daignault »

just4u_sharath wrote:
ray.wurlod wrote:Licensing for DataStage is based on the number of CPUs, not on the number of CPU cycles.

Cycle (or MIPS) licensing is really only ever encountered with mainframe systems. That said, I am not aware of the DataStage licensing model on USS - check with your support provider.

There is no relationship whatsoever between logical nodes and CPUs.

Your job will create a score at runtime that includes a number of operators. Each operator, plus a controlling process called a "section leader", will execute as separate processes. It is how those processes are allocated to CPUs by the operating system CPU scheduler - and which can vary over time - that determines the total number of CPUs dedicated to DataStage processing at any given instant.

Therefore your question about reporting the number of CPUs per node is moot.
So you say there is no relation ship between nodes and CPUS. So if i have 20 CPUS and there are 40 opertors, then all these 40 operators are executed by 20 CPUS. IF this is case, the speed of job depends on NUmber of CPUS, not on logical nodes. So nodes has significance only at partitioning. It has no importance when we talk about speed of job. I also heard the each node has soem fixed number of CPUS. Some nodes may have more CPUS and some less. IS this statement true?
This is not quite true. There is no binding of processes to a CPU (affinity) within the EE engine. However, if you have a 4 CPU system and set your APT for 20 nodes, you will have a large number of processes in the queue in a ready state for CPU allocation. However most of the processes will be blocked due to I/O.

Also, in the same 4CPU/20 node environment, if your job connects to an Oracle database at both ends (1 input and 1 output) then you will be establishing at least 40 connections to your Oracle instance. This could lead to an unhappy DBA of the Oracle instance.

In the reverse instance, if you have a 20CPU system and you only allocate 2 nodes in the APT, there will be a limited number of processes in the ready queue awaiting processing. And since there is only 2 streams of input data, you will most likely throttle the job by lack of data to process.

The usual rule of thumb is 2xCPU in your APT file.

Regards,

Ray Daignault
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The rule of the other thumb is to have some smaller configuration files for running multiple small jobs at the same time.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

As said, the process varies based on the Number of nodes. And each node will be allocated with list of process of each operator. And the CPU pool in each node will share the process allocated to it.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

There's no such thing as a "CPU pool in each node".

CPUs are totally independent of nodes. Processes are started from nodes; the operating system allocates processes to CPUs on a time-share basis. Even one process may visit several CPUs during the course of a long-running execution (but only one at any one time). DataStage does not attempt to control this; that's one of the main functions of a multi-user operating system.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
just4u_sharath
Premium Member
Premium Member
Posts: 236
Joined: Sun Apr 01, 2007 7:41 am
Location: Michigan

Post by just4u_sharath »

ray.wurlod wrote:There's no such thing as a "CPU pool in each node".

CPUs are totally independent of nodes. Processes are started from nodes; the operating system allocates processes to CPUs on a time-share basis. Even one process may visit several CPUs during the course of a long-running execution (but only one at any one time). DataStage does not attempt to control this; that's one of the main functions of a multi-user operating system.
If CPUS are totally independent of Nodes, then how can i restrict my job to run on specific CPUS. I am assuming that specific CPUS will be assigned to each nodes and by constraining the job to run on specific nodes, the job is constrained to specific CPUS. If there is no relation between CPUS and nodes, and every small job if all the CPUS are executed, that will be a problem.
John Smith
Charter Member
Charter Member
Posts: 193
Joined: Tue Sep 05, 2006 8:01 pm
Location: Australia

Post by John Smith »

Simple answer is you can't. It's up to the OS.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The only way you can restrict nodes to execute on specific CPUs is where those CPUs are in separate machines, and you are not executing in a grid environment.

In that case you can create node pools containing just the node(s) on the machine(s) - and therefore CPU(s) - in question.

There is no other way to restrict nodes to CPUs.
Your assumption is totally incorrect.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
daignault
Premium Member
Premium Member
Posts: 165
Joined: Tue Mar 30, 2004 2:44 pm
Contact:

Post by daignault »

If affinity of job to CPU is what you are trying to accomplish, IBM has the solution for you :)

If you use LPARs on an on a Power5 AIX system, you can designate CPU affinity and limit the CPU/Memory resources assigned to a Datastage EE job executing in that LPAR.

Never say it Can't be done. Just throw money and it can be done :)

Ray D
Post Reply