Page 1 of 1

APT_CONFIG_FILE

Posted: Fri Apr 29, 2011 10:47 am
by Champa
Hi,

Do you know what is the relationship/formula between number of nodes that can be used in a project and actual number of nodes available on the machine in parallel edition of datastage?

Thanks

Posted: Fri Apr 29, 2011 11:10 am
by jwiles
I'm not quite sure what it is you're looking for here. Perhaps you can restate your request to more clearly define the what and why?

Regards,

Posted: Fri Apr 29, 2011 11:45 am
by chulett
There isn't one. 'Node' in a config file is a logical concept and has no direct correlation to any physical nodes your servers may or may not have.

Posted: Fri Apr 29, 2011 3:12 pm
by ray.wurlod
To put it another way, machines don't have nodes.

Posted: Sat Apr 30, 2011 5:17 am
by Champa
Thank you all for clarifying.

Posted: Sat Apr 30, 2011 6:46 am
by chulett
It's been discussed here before, so other posts should have additional information. Biggest thing to take away is that 'node' in the config file is more akin to 'thread' than machine or CPU.

Posted: Sat Apr 30, 2011 9:20 pm
by greggknight
And to say it another way.
when a DS job runs in spawns an OSH process for each node defined in your config. This process is called the conductor. Each conductor will spawn multiple OSH processes called players for the objects in the job.Due to the pipline parellelism architecture all stages start at the same time and are ready to process data as it is available. Which has different determining factors. For example seq mode or parellel type of object etc.

Thats why you need to pay attention when defining an apt file. Just because you have 4 cores does not mean that you define 4 nodes. depending onhow many jobs are running concurrently and the type of jobs the number of stages and types you could easily exploit your cpu.The number of disk controllers available also comes into play as well as memory.

Datastage is process intensive as opposed to thread intensive.

Posted: Sun May 01, 2011 3:52 am
by zulfi123786
greggknight wrote:And to say it another way.
Each conductor will spawn multiple OSH processes called players for the objects in the job.
I guess the above should be :

Each Section Leader will spawn multiple OSH processes called players for the objects in the job

Posted: Sun May 01, 2011 3:57 am
by zulfi123786
I recollect the document suggesting to have n/2 nodes in config file where n is the number of physical processors and then fine tune the number for optimun performance.

Again it is just a suggestion for the layman. Never seen any project following the suggestion.