CPU CYCLES

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

John Smith
Charter Member
Charter Member
Posts: 193
Joined: Tue Sep 05, 2006 8:01 pm
Location: Australia

Post by John Smith »

daignault wrote:If affinity of job to CPU is what you are trying to accomplish, IBM has the solution for you :)

If you use LPARs on an on a Power5 AIX system, you can designate CPU affinity and limit the CPU/Memory resources assigned to a Datastage EE job executing in that LPAR.

Never say it Can't be done. Just throw money and it can be done :)

Ray D
Like I said my answer was a "simple answer" and without really having any information of the OP environment and constraints that's the simple answer. Of course with money,time and resources everything (well almost everything) is possible.... ;)
just4u_sharath
Premium Member
Premium Member
Posts: 236
Joined: Sun Apr 01, 2007 7:41 am
Location: Michigan

CPU CYCLES

Post by just4u_sharath »

daignault wrote:If affinity of job to CPU is what you are trying to accomplish, IBM has the solution for you :)

If you use LPARs on an on a Power5 AIX system, you can designate CPU affinity and limit the CPU/Memory resources assigned to a Datastage EE job executing in that LPAR.

Never say it Can't be done. Just throw money and it can be done :)

Ray D
So based on this i can assume that currently all the CPUS work on all operators (one at a time). We cannot restrict the job to certain CPUS (unless we throw money). Nodes are no way related to CPUS. Nodes are only used to generate the Unix Process of each operator which are executed by CPUS.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You can not even assume that.

"all the CPUS work on all operators (one at a time)" is only true if you have more operators than CPUs, and that the CPUs aren't also doing other things such as managing a database server's processes.

Nodes are used to govern the degree of parallelism under which your job runs. Each operator generates one process on each node in which that operator executes, which in turn may be affected by the use of node pools within the job design.

You can't even guarantee that the same operator from all nodes is executing at the same time: this will depend on how evenly the data are distributed across the nodes, and what else is happening on those nodes. On the other hand, you don't need to worry about this - this is precisely what the Orchestrate framework manages for you. Further complexity is added if you re-partition your data, but let's not go there.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Each machine will have set of CPUs. And each machine(s) can be assigned to each node(s). By this way, if you have more than one machine(fastname), it can be assigned to single node also. Or each machine can be assigned to speratate node also. Process created for each node will be shared by the CPU available for that Node inturns the machine(s). So not necessarily process of all nodes will be executed by a CPU.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
Post Reply