APT_CONFIG_FILE

Champa · Post by **Champa** » Fri Apr 29, 2011 10:47 am

Hi,

Do you know what is the relationship/formula between number of nodes that can be used in a project and actual number of nodes available on the machine in parallel edition of datastage?

Thanks

jwiles · Post by **jwiles** » Fri Apr 29, 2011 11:10 am

I'm not quite sure what it is you're looking for here. Perhaps you can restate your request to more clearly define the what and why?

Regards,

chulett · Post by **chulett** » Fri Apr 29, 2011 11:45 am

There isn't one. 'Node' in a config file is a logical concept and has no direct correlation to any physical nodes your servers may or may not have.

ray.wurlod · Post by **ray.wurlod** » Fri Apr 29, 2011 3:12 pm

To put it another way, machines don't have nodes.

Champa · Post by **Champa** » Sat Apr 30, 2011 5:17 am

Thank you all for clarifying.

chulett · Post by **chulett** » Sat Apr 30, 2011 6:46 am

It's been discussed here before, so other posts should have additional information. Biggest thing to take away is that 'node' in the config file is more akin to 'thread' than machine or CPU.

greggknight · Post by **greggknight** » Sat Apr 30, 2011 9:20 pm

And to say it another way.
when a DS job runs in spawns an OSH process for each node defined in your config. This process is called the conductor. Each conductor will spawn multiple OSH processes called players for the objects in the job.Due to the pipline parellelism architecture all stages start at the same time and are ready to process data as it is available. Which has different determining factors. For example seq mode or parellel type of object etc.

Thats why you need to pay attention when defining an apt file. Just because you have 4 cores does not mean that you define 4 nodes. depending onhow many jobs are running concurrently and the type of jobs the number of stages and types you could easily exploit your cpu.The number of disk controllers available also comes into play as well as memory.

Datastage is process intensive as opposed to thread intensive.

zulfi123786 · Post by **zulfi123786** » Sun May 01, 2011 3:52 am

greggknight wrote:And to say it another way.
Each conductor will spawn multiple OSH processes called players for the objects in the job.

I guess the above should be :

Each Section Leader will spawn multiple OSH processes called players for the objects in the job

zulfi123786 · Post by **zulfi123786** » Sun May 01, 2011 3:57 am

I recollect the document suggesting to have n/2 nodes in config file where n is the number of physical processors and then fine tune the number for optimun performance.

Again it is just a suggestion for the layman. Never seen any project following the suggestion.