Spawning of processes in a DSEE environment

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
sathyanveshi
Participant
Posts: 66
Joined: Tue Dec 07, 2004 12:48 pm

Spawning of processes in a DSEE environment

Post by sathyanveshi »

Hi,

Suppose I have 8 CPUs and I configure 4 nodes out of it, can I assume that 1 node is equivalent to 2 CPUs?

Also, is node a Unix process? Is it a fork() that is being invoked to when the CPUs are converted into nodes?

Cheers,
Mohan
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Sathyanveshi,

the only thing controlling on which CPUs processes get executed on is the OS - DataStage has no control on where a specific orchestrate pid gets thrown. So in your case with a 4-node configuration file and 8 cpus you cannot assume that a node will go to 2 cpus. But nonetheless the processing load will be apportioned to all the available CPUs by UNIX so effectively you can pretend that this is happening.

The number nodes you declare in your configuration tell DataStage into how many distinct parallel threads it needs to split the Job into. Then, depending upon the number and type of stages in the job, these concurrent processing streams are further broken down into separate processes (pids visible in the ps command). But from this point on the operating system will take over and move processes around. When a given "node" process gets swapped out it will not necessarily execute on the same physical CPU when it gets brought back into memory.



The underlying mechanism that UNIX will use to spawn new process is the fork() call.
Post Reply