Page 1 of 1

Jobs failing when running parallely

Posted: Tue Aug 03, 2010 11:54 am
by pradeep9081
Hi,

We are running multiple jobs (5-6 jobs) at a time from the scheuduler.
Jobs are failing and getting the below error:

Unable to start ORCHESTRATE process on node node1 (nsyrp41b): APT_PMPlayer::APT_PMPlayer: fork() failed, Not enough space.

If i run the individual job then works fine.

If there is a look up then getting the below error:

"/eeadm2/IBM/InformationServer/Server/Datasets/lookuptable.20100803.jm02qdd": No space left on device
APT_BufferOperator: Add block to queue failed. This means that your buffer file systems all ran out of file space, or that some other system error occurred. Please ensure that you have sufficient scratchdisks in either the default or "buffer" pools on all nodes in your configuration file.

we have 2 node configuration file in dev. pointing both nodes to same scratch disk space.

is this dues to the buffer size? what is the best resolution for this ?

Re: Jobs failing when running parallely

Posted: Tue Aug 03, 2010 12:04 pm
by kris007
pradeep9081 wrote: "/eeadm2/IBM/InformationServer/Server/Datasets/lookuptable.20100803.jm02qdd": No space left on device
APT_BufferOperator: Add block to queue failed. This means that your buffer file systems all ran out of file space, or that some other system error occurred. Please ensure that you have sufficient scratchdisks in either the default or "buffer" pools on all nodes in your configuration file.

we have 2 node configuration file in dev. pointing both nodes to same scratch disk space.

is this dues to the buffer size? what is the best resolution for this ?
Yes. The error message says it all. Your scratch disk space is full. You need to add extra space or reschedule your jobs so that they don't run at the same time.

Re: Jobs failing when running parallely

Posted: Wed Aug 04, 2010 1:21 am
by mouthou
Or if the diskspace management is out of your control, the job(s) can be modified to use Join stage instead of Lookup or use the lookup with the filter conditions :idea:

Re: Jobs failing when running parallely

Posted: Wed Aug 04, 2010 1:22 am
by Barath
pradeep9081 wrote:Hi,

We are running multiple jobs (5-6 jobs) at a time from the scheuduler.
Jobs are failing and getting the below error:

Unable to start ORCHESTRATE process on node node1 (nsyrp41b): APT_PMPlayer::APT_PMPlayer: fork() failed, Not enough space.

If i run the individual job then works fine.

If there is a look up then getting the below error:

"/eeadm2/IBM/InformationServer/Server/Datasets/lookuptable.20100803.jm02qdd": No space left on device
APT_BufferOperator: Add block to queue failed. This means that your buffer file systems all ran out of file space, or that some other system error occurred. Please ensure that you have sufficient scratchdisks in either the default or "buffer" pools on all nodes in your configuration file.

we have 2 node configuration file in dev. pointing both nodes to same scratch disk space.

is this dues to the buffer size? what is the best resolution for this ?
Which Partitioning you are using . If it is Entire then change it to Hash.
If you are still getting issue with space then you need to add space what kris007 is saying correct.