what is meaning of node in the datastage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
n.parameswara.reddy@accen
Participant
Posts: 40
Joined: Mon May 18, 2009 5:22 am

what is meaning of node in the datastage

Post by n.parameswara.reddy@accen »

what is meaning of node in the data stage ? We thought that it is a processor, please give us some explanation on this

Thanks
Reddy
Last edited by n.parameswara.reddy@accen on Mon Jun 29, 2009 6:57 am, edited 1 time in total.
miwinter
Participant
Posts: 396
Joined: Thu Jun 22, 2006 7:00 am
Location: England, UK

Post by miwinter »

Hi,

You really need to read the documentation to understand this properly, but if you want a throwaway comment about what a "node" is, then in simple terms it is "somewhere to run a process or processes".
Mark Winter
<i>Nothing appeases a troubled mind more than <b>good</b> music</i>
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Welcome.

Node is a logical rather than a physical concept and while the number of CPUs plays a part here it is not equivalent to a CPU. Think of it as the number of "invocations" or "instances" or "copies" of a job to run, each one sharing the workload and transforming their share of the data. They could all be running on separate CPUs or they could all be running on the same one, that's not up to you but rather the underlying O/S.
-craig

"You can never have too many knives" -- Logan Nine Fingers
rsunny
Participant
Posts: 223
Joined: Sat Jul 03, 2010 10:22 pm

Post by rsunny »

Hi craig,

In genral when we talk about 2 readers/node and no. of nodes are 2 then no. of invocations or copies of a job are 4 each one sharing the workload and transforming their share of the data?

thanks in advance
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The number of nodes is determined absolutely by the number of node names in a syntactically valid parallel execution configuration file.

As noted, a node is a logical construct, associated with a set of resources available when execution occurs on that node.

So, if the job is run using a configuration file that specifies four nodes, then four "copies" of that job execute each processing approximately one quarter of the records in the source data set. [Other factors can mitigate against achieving such even distribution, but these are beyond the scope of the current question.]

Questions about multiple readers per node really belong in a separate thread.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply