APT_CONFIG_FILE
Moderators: chulett, rschirm, roy
APT_CONFIG_FILE
Hi,
Do you know what is the relationship/formula between number of nodes that can be used in a project and actual number of nodes available on the machine in parallel edition of datastage?
Thanks
Do you know what is the relationship/formula between number of nodes that can be used in a project and actual number of nodes available on the machine in parallel edition of datastage?
Thanks
Champa
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Premium Member
- Posts: 120
- Joined: Thu Oct 28, 2004 4:24 pm
And to say it another way.
when a DS job runs in spawns an OSH process for each node defined in your config. This process is called the conductor. Each conductor will spawn multiple OSH processes called players for the objects in the job.Due to the pipline parellelism architecture all stages start at the same time and are ready to process data as it is available. Which has different determining factors. For example seq mode or parellel type of object etc.
Thats why you need to pay attention when defining an apt file. Just because you have 4 cores does not mean that you define 4 nodes. depending onhow many jobs are running concurrently and the type of jobs the number of stages and types you could easily exploit your cpu.The number of disk controllers available also comes into play as well as memory.
Datastage is process intensive as opposed to thread intensive.
when a DS job runs in spawns an OSH process for each node defined in your config. This process is called the conductor. Each conductor will spawn multiple OSH processes called players for the objects in the job.Due to the pipline parellelism architecture all stages start at the same time and are ready to process data as it is available. Which has different determining factors. For example seq mode or parellel type of object etc.
Thats why you need to pay attention when defining an apt file. Just because you have 4 cores does not mean that you define 4 nodes. depending onhow many jobs are running concurrently and the type of jobs the number of stages and types you could easily exploit your cpu.The number of disk controllers available also comes into play as well as memory.
Datastage is process intensive as opposed to thread intensive.
"Don't let the bull between you and the fence"
Thanks
Gregg J Knight
"Never Never Never Quit"
Winston Churchill
Thanks
Gregg J Knight
"Never Never Never Quit"
Winston Churchill
-
- Premium Member
- Posts: 730
- Joined: Tue Nov 04, 2008 10:14 am
- Location: Bangalore
-
- Premium Member
- Posts: 730
- Joined: Tue Nov 04, 2008 10:14 am
- Location: Bangalore