Page 1 of 1

configuration of nodes in default.apt

Posted: Thu Apr 21, 2016 5:16 am
by tine_bi
Hi

We have an enviroment where we in addition to c drive have access to d,e,f

How should we best configure our .pat configuration file.

Currently I am testing

{
node "node1"
{
fastname "dsprdsrv1"
pools ""
resource disk "D:/IBM_NODE_CONFIG/Datasets" {pools ""}
resource scratchdisk "D:/IBM_NODE_CONFIG/Scratch" {pools ""}
}
node "node2"
{
fastname "dsprdsrv1"
pools ""
resource disk "F:/IBM_NODE_CONFIG/Datasets" {pools ""}
resource scratchdisk "F:/IBM_NODE_CONFIG/Scratch" {pools ""}
}
node "node3"
{
fastname "dsprdsrv1"
pools "" "sort"
resource disk "E:/IBM_NODE_CONFIG/Datasets" {pools "" "sort"}
resource scratchdisk "E:/IBM_NODE_CONFIG/Scratch" {pools "" "sort"}
}

}

But not sure if this is a good way to do this.
Also what should be the ideal size of each drive?
The server is set on an Hyper-v enviroment allocated with 12 cpu and 24 GB memory

Please advicse

BR
Dan

Posted: Thu Apr 21, 2016 7:22 am
by PaulVL
My advice is setting your scratch disk to be on the local drive if possible. Network drives will work, but slow the interaction down.

As for size, it really depends on quantity of data, quantity and quality of sorts in a job, quantity of concurrent jobs.

We can't answer those for you.


I also like my parallelism in even numbers.

==============

I would leave your default.apt as a two node configuration, and then create other apt files with various degrees of parallelism based upon your data/job needs.

I like to create project_dev.apt, project_tst.apt, project_prd.apt and set those into your various projects as default. It helps in the future when you have different default project requirements for resources. Project #2 might come along and have a different scratch disk because he purchased it, and politically can't interfere with project #1 running on the same box... etc...

project_tst_4Nodes.apt, project_tst_6Nodes.apt, etc...

You always want a variety of config files to suite your needs. If you were grid, you would handle that variety via parameters. But you are not, so build them up ahead of time. Having a 1 node apt file is good for jobs that would simply submit a stored procedure on a database for instance. No data traveling in datastage, so no need for more than 1 thread.