APT_CONFIG_FILE

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Champa
Participant
Posts: 88
Joined: Wed Dec 14, 2005 1:44 pm

APT_CONFIG_FILE

Post by Champa »

Hi,

Do you know what is the relationship/formula between number of nodes that can be used in a project and actual number of nodes available on the machine in parallel edition of datastage?

Thanks
Champa
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

I'm not quite sure what it is you're looking for here. Perhaps you can restate your request to more clearly define the what and why?

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

There isn't one. 'Node' in a config file is a logical concept and has no direct correlation to any physical nodes your servers may or may not have.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

To put it another way, machines don't have nodes.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Champa
Participant
Posts: 88
Joined: Wed Dec 14, 2005 1:44 pm

Post by Champa »

Thank you all for clarifying.
Champa
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

It's been discussed here before, so other posts should have additional information. Biggest thing to take away is that 'node' in the config file is more akin to 'thread' than machine or CPU.
-craig

"You can never have too many knives" -- Logan Nine Fingers
greggknight
Premium Member
Premium Member
Posts: 120
Joined: Thu Oct 28, 2004 4:24 pm

Post by greggknight »

And to say it another way.
when a DS job runs in spawns an OSH process for each node defined in your config. This process is called the conductor. Each conductor will spawn multiple OSH processes called players for the objects in the job.Due to the pipline parellelism architecture all stages start at the same time and are ready to process data as it is available. Which has different determining factors. For example seq mode or parellel type of object etc.

Thats why you need to pay attention when defining an apt file. Just because you have 4 cores does not mean that you define 4 nodes. depending onhow many jobs are running concurrently and the type of jobs the number of stages and types you could easily exploit your cpu.The number of disk controllers available also comes into play as well as memory.

Datastage is process intensive as opposed to thread intensive.
"Don't let the bull between you and the fence"

Thanks
Gregg J Knight

"Never Never Never Quit"
Winston Churchill
zulfi123786
Premium Member
Premium Member
Posts: 730
Joined: Tue Nov 04, 2008 10:14 am
Location: Bangalore

Post by zulfi123786 »

greggknight wrote:And to say it another way.
Each conductor will spawn multiple OSH processes called players for the objects in the job.
I guess the above should be :

Each Section Leader will spawn multiple OSH processes called players for the objects in the job
zulfi123786
Premium Member
Premium Member
Posts: 730
Joined: Tue Nov 04, 2008 10:14 am
Location: Bangalore

Post by zulfi123786 »

I recollect the document suggesting to have n/2 nodes in config file where n is the number of physical processors and then fine tune the number for optimun performance.

Again it is just a suggestion for the layman. Never seen any project following the suggestion.
Post Reply