jobs on 4 x 4 node PX: Very slow & finally failing

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
dsproj2003
Participant
Posts: 21
Joined: Wed Oct 01, 2003 11:53 am

jobs on 4 x 4 node PX: Very slow & finally failing

Post by dsproj2003 »

Hi,

The performance of jobs is repeatedly getting very bad. We rebooted the DataStage server, and brought up the Data Stage engine.

However, the performance problem still seems to persist. Very frequently the jobs abort.

It gives the following messages
node_node3: Unable to start ORCHESTRATE process on node node3 (bidev1): APT_PMPlayer::APT_PMPlayer: fork() failed, Not enough space
main_program: The Section Leader on node node3 has terminated unexpectedly.
node_node1: Unable to start ORCHESTRATE process on node node1 (bidev1): APT_PMPlayer::APT_PMPlayer: fork() failed, Not enough space
node_node2: Unable to start ORCHESTRATE process on node node2 (bidev1): APT_PMPlayer::APT_PMPlayer: fork() failed, Not enough space
node_node4: Unable to start ORCHESTRATE process on node node4 (bidev1): APT_PMPlayer::APT_PMPlayer: fork() failed, Not enough space

What is the resolution to this?

Any pointers to this are most welcome.

Thanks.

Regards,
Nitin
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

First off, I don't have PX. But, a 'fork' message of this type means a new process (or pipe, etc) couldn't be forked (created) because of a lack of 'space' - and space here typically means swap space or ram resources. Could be other things like topping out a configuration parameter, but let's start there.

System specs? Any issues with RAM or swap you are aware of?

-craig
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

When a PX job starts, the first process is the Conductor, which starts a Player process for each node in the configuration file. (It WAS called "Orchestrate" after all!). Each player may then start one or more osh processes to do the actual work, the number also depending on your partitioning scheme(s).
What the error message is telling you is that some space resource (memory or disk) is inadequate for the number of processes you are trying to start. Try running the job using a smaller configuration (perhaps only two nodes) and monitor memory and disk space consumption. The scale-up factor to four nodes is approximately linear.


Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518
Teej
Participant
Posts: 677
Joined: Fri Aug 08, 2003 9:26 am
Location: USA

Post by Teej »

quote:Originally posted by Ray.Wurlod
[br]Try running the job using a smaller configuration (perhaps only two nodes) and monitor memory and disk space consumption. The scale-up factor to four nodes is approximately linear.


To add to this, also ensure that your kernel is compiled with the configuration recommended at a minimum by Ascential as detailed on the "Install and Upgrade Guide". It make a dramatic difference in term of availability with resources.

-T.J.


* * *

... now if this can make breakfast, my life is complete.
bigpoppa
Participant
Posts: 190
Joined: Fri Feb 28, 2003 11:39 am

Post by bigpoppa »

I believe that this sometimes also happens when the score file is too long.. Breaking up the score file might be the next step for you if the previous suggestions do not work.
Post Reply