Details of Node and Pool in the configuration File

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
ReachKumar
Participant
Posts: 29
Joined: Wed Jan 06, 2010 7:18 am

Details of Node and Pool in the configuration File

Post by ReachKumar »

Hi,

Can some one explain the terms Node, Pool and the relationship between Nodes and Pools in the Datas=Stage configuration file?
Regards,
Kumar
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

A node is a parallel stream running a copy of your job. If your configuration file has 8 nodes configured, then your job will default to running 8 parallel job-streams every time your job processes.

A pool is nothing more than a means of grouping resources for processing purposes, whether it be a disk pool or a node pool. For example you could tag half of your nodes with the name "SmallPool". Then you could run your job and restrict it to use the "SmallPool" nodes, which in this case would reduce the job to only running four parallel processes.

Note: node pools are not commonly used at most sites, more commonly I see different configuration files used to assign different numbers of nodes. Probably because lots of people find Pools confusing.. :-)
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
ReachKumar
Participant
Posts: 29
Joined: Wed Jan 06, 2010 7:18 am

Post by ReachKumar »

Thanks asorrell .

One more clarification:
If pool is nothing but grouping resources like disk pool then what is resouce disk and scratch disk.

Is disk pool same as resouce disk and scratch disk?
Please explain
Regards,
Kumar
zulfi123786
Premium Member
Premium Member
Posts: 730
Joined: Tue Nov 04, 2008 10:14 am
Location: Bangalore

Post by zulfi123786 »

Resource Disk: It's the location where your persistant data is stored like datasets, filesets etc.

Scratch Disk: It's the disk space which is used by datastage to create temporary files as and when needed Ex:Datastage creates temporary files while sorting the data which are cleared out after sort has been performed.

we can group the Resource and Scratch disks into pools.....

Gurus correct me if I am wrong........
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Node Pools can be subsets of the available nodes. The default node pool (which has the name "" in the configuration file) must include at least one node.

Disk Pools can be subsets or supersets of the disk (resource or scratch) available.

One site where I worked used a 34 node configuration, of which 10 were assigned to processing and 24 were assigned (in a DB2 node pool) to the DB2 stages. This site processed huge volumes of data. At busier times they changed configuration to use 16 for processing and 24 for DB2.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply