what is meaning of node in the data stage ? We thought that it is a processor, please give us some explanation on this
Thanks
Reddy
what is meaning of node in the datastage
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 40
- Joined: Mon May 18, 2009 5:22 am
what is meaning of node in the datastage
Last edited by n.parameswara.reddy@accen on Mon Jun 29, 2009 6:57 am, edited 1 time in total.
Welcome.
Node is a logical rather than a physical concept and while the number of CPUs plays a part here it is not equivalent to a CPU. Think of it as the number of "invocations" or "instances" or "copies" of a job to run, each one sharing the workload and transforming their share of the data. They could all be running on separate CPUs or they could all be running on the same one, that's not up to you but rather the underlying O/S.
Node is a logical rather than a physical concept and while the number of CPUs plays a part here it is not equivalent to a CPU. Think of it as the number of "invocations" or "instances" or "copies" of a job to run, each one sharing the workload and transforming their share of the data. They could all be running on separate CPUs or they could all be running on the same one, that's not up to you but rather the underlying O/S.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
The number of nodes is determined absolutely by the number of node names in a syntactically valid parallel execution configuration file.
As noted, a node is a logical construct, associated with a set of resources available when execution occurs on that node.
So, if the job is run using a configuration file that specifies four nodes, then four "copies" of that job execute each processing approximately one quarter of the records in the source data set. [Other factors can mitigate against achieving such even distribution, but these are beyond the scope of the current question.]
Questions about multiple readers per node really belong in a separate thread.
As noted, a node is a logical construct, associated with a set of resources available when execution occurs on that node.
So, if the job is run using a configuration file that specifies four nodes, then four "copies" of that job execute each processing approximately one quarter of the records in the source data set. [Other factors can mitigate against achieving such even distribution, but these are beyond the scope of the current question.]
Questions about multiple readers per node really belong in a separate thread.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.