jobs on 4 x 4 node PX: Very slow & finally failing

dsproj2003 · Post by **dsproj2003** » Thu Jul 31, 2003 9:12 pm

Hi,

The performance of jobs is repeatedly getting very bad. We rebooted the DataStage server, and brought up the Data Stage engine.

However, the performance problem still seems to persist. Very frequently the jobs abort.

It gives the following messages
node_node3: Unable to start ORCHESTRATE process on node node3 (bidev1): APT_PMPlayer::APT_PMPlayer: fork() failed, Not enough space
main_program: The Section Leader on node node3 has terminated unexpectedly.
node_node1: Unable to start ORCHESTRATE process on node node1 (bidev1): APT_PMPlayer::APT_PMPlayer: fork() failed, Not enough space
node_node2: Unable to start ORCHESTRATE process on node node2 (bidev1): APT_PMPlayer::APT_PMPlayer: fork() failed, Not enough space
node_node4: Unable to start ORCHESTRATE process on node node4 (bidev1): APT_PMPlayer::APT_PMPlayer: fork() failed, Not enough space

What is the resolution to this?

Any pointers to this are most welcome.

Thanks.

Regards,
Nitin

chulett · Post by **chulett** » Thu Jul 31, 2003 9:24 pm

First off, I don't have PX. But, a 'fork' message of this type means a new process (or pipe, etc) couldn't be forked (created) because of a lack of 'space' - and space here typically means swap space or ram resources. Could be other things like topping out a configuration parameter, but let's start there.

System specs? Any issues with RAM or swap you are aware of?

-craig

ray.wurlod · Post by **ray.wurlod** » Fri Aug 01, 2003 1:14 am

When a PX job starts, the first process is the Conductor, which starts a Player process for each node in the configuration file. (It WAS called "Orchestrate" after all!). Each player may then start one or more osh processes to do the actual work, the number also depending on your partitioning scheme(s).
What the error message is telling you is that some space resource (memory or disk) is inadequate for the number of processes you are trying to start. Try running the job using a smaller configuration (perhaps only two nodes) and monitor memory and disk space consumption. The scale-up factor to four nodes is approximately linear.

Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518

Teej · Post by **Teej** » Fri Aug 08, 2003 1:50 pm

quote:Originally posted by Ray.Wurlod
[br]Try running the job using a smaller configuration (perhaps only two nodes) and monitor memory and disk space consumption. The scale-up factor to four nodes is approximately linear.

To add to this, also ensure that your kernel is compiled with the configuration recommended at a minimum by Ascential as detailed on the "Install and Upgrade Guide". It make a dramatic difference in term of availability with resources.

-T.J.

* * *

... now if this can make breakfast, my life is complete.

bigpoppa · Post by **bigpoppa** » Mon Aug 18, 2003 3:23 pm

I believe that this sometimes also happens when the score file is too long.. Breaking up the score file might be the next step for you if the previous suggestions do not work.