Details of Node and Pool in the configuration File

ReachKumar · Post by **ReachKumar** » Tue Mar 23, 2010 8:18 am

Hi,

Can some one explain the terms Node, Pool and the relationship between Nodes and Pools in the Datas=Stage configuration file?

IBM Analytics Champion 2009 - 2020 · Post by **asorrell** » Tue Mar 23, 2010 11:08 am

A node is a parallel stream running a copy of your job. If your configuration file has 8 nodes configured, then your job will default to running 8 parallel job-streams every time your job processes.

A pool is nothing more than a means of grouping resources for processing purposes, whether it be a disk pool or a node pool. For example you could tag half of your nodes with the name "SmallPool". Then you could run your job and restrict it to use the "SmallPool" nodes, which in this case would reduce the job to only running four parallel processes.

Note: node pools are not commonly used at most sites, more commonly I see different configuration files used to assign different numbers of nodes. Probably because lots of people find Pools confusing..

ReachKumar · Post by **ReachKumar** » Wed Mar 24, 2010 12:19 am

Thanks asorrell .

One more clarification:
If pool is nothing but grouping resources like disk pool then what is resouce disk and scratch disk.

Is disk pool same as resouce disk and scratch disk?
Please explain

zulfi123786 · Post by **zulfi123786** » Wed Mar 24, 2010 12:46 am

Resource Disk: It's the location where your persistant data is stored like datasets, filesets etc.

Scratch Disk: It's the disk space which is used by datastage to create temporary files as and when needed Ex:Datastage creates temporary files while sorting the data which are cleared out after sort has been performed.

we can group the Resource and Scratch disks into pools.....

Gurus correct me if I am wrong........

ray.wurlod · Post by **ray.wurlod** » Wed Mar 24, 2010 1:05 am

Node Pools can be subsets of the available nodes. The default node pool (which has the name "" in the configuration file) must include at least one node.

Disk Pools can be subsets or supersets of the disk (resource or scratch) available.

One site where I worked used a 34 node configuration, of which 10 were assigned to processing and 24 were assigned (in a DB2 node pool) to the DB2 stages. This site processed huge volumes of data. At busier times they changed configuration to use 16 for processing and 24 for DB2.