Sequencer Aborts Due to Error Code = -14
Moderators: chulett, rschirm, roy
Sequencer Aborts Due to Error Code = -14
Hi,
I have a sequencer which calls 3 jobs, during the call of one job the sequencer got aborted and find the following log message. Please help me in resolving this.
72 STARTED Tue Jan 17 03:19:08 2006
Starting Job DS_MNP_MNP.20060117_031852. (...)
73 INFO Tue Jan 17 03:19:08 2006
Environment variable settings: (...)
74 INFO Tue Jan 17 03:19:09 2006
DS_MNP_MNP.20060117_031852.JobControl (@Coordinator): Starting new run o
f checkpointed Sequence job
75 BATCH Tue Jan 17 03:19:10 2006
DS_MNP_MNP -> (DS_MNP_MNP_XFM.20060117_031852): Job run requested (...)
76 INFO Tue Jan 17 03:19:10 2006
DS_MNP_MNP.20060117_031852.JobControl (DSRunJob): Waiting for job DS_MNP
_MNP_XFM.20060117_031852 to start
77 WARNING Tue Jan 17 03:20:12 2006
DS_MNP_MNP.20060117_031852.JobControl (@JOB_DS_MNP_MMP): Controller prob
lem: Error calling DSRunJob(DS_MNP_MNP_XFM.20060117_031852), code=-14 (...)
Regards,
O.A.C.
I have a sequencer which calls 3 jobs, during the call of one job the sequencer got aborted and find the following log message. Please help me in resolving this.
72 STARTED Tue Jan 17 03:19:08 2006
Starting Job DS_MNP_MNP.20060117_031852. (...)
73 INFO Tue Jan 17 03:19:08 2006
Environment variable settings: (...)
74 INFO Tue Jan 17 03:19:09 2006
DS_MNP_MNP.20060117_031852.JobControl (@Coordinator): Starting new run o
f checkpointed Sequence job
75 BATCH Tue Jan 17 03:19:10 2006
DS_MNP_MNP -> (DS_MNP_MNP_XFM.20060117_031852): Job run requested (...)
76 INFO Tue Jan 17 03:19:10 2006
DS_MNP_MNP.20060117_031852.JobControl (DSRunJob): Waiting for job DS_MNP
_MNP_XFM.20060117_031852 to start
77 WARNING Tue Jan 17 03:20:12 2006
DS_MNP_MNP.20060117_031852.JobControl (@JOB_DS_MNP_MMP): Controller prob
lem: Error calling DSRunJob(DS_MNP_MNP_XFM.20060117_031852), code=-14 (...)
Regards,
O.A.C.
Re: Sequencer Aborts Due to Error Code = -14
Looks like your machine is too busy and is not able to start a new child process(Error Code -14)oacvb wrote:Hi,
DS_MNP_MNP.20060117_031852.JobControl (DSRunJob): Waiting for job DS_MNP
_MNP_XFM.20060117_031852 to start
77 WARNING Tue Jan 17 03:20:12 2006
DS_MNP_MNP.20060117_031852.JobControl (@JOB_DS_MNP_MMP): Controller prob
lem: Error calling DSRunJob(DS_MNP_MNP_XFM.20060117_031852), code=-14 (...)
Regards,
O.A.C.
Success consists of getting up just one more time than you fall.
An error 14 is a timeout. It is a symptom of an overloaded machine - i.e. too many processes / jobs attempting to run at the same time. This could be because the hardware is underpowered for the load being put on it or perhaps because some tunables are set too low.
You haven't given us any clues about your hardware other than it's UNIX. What are your system specs? Have you done any uvconfig tuning or is everything pretty much 'out of the box'?
You haven't given us any clues about your hardware other than it's UNIX. What are your system specs? Have you done any uvconfig tuning or is everything pretty much 'out of the box'?
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Premium Member
- Posts: 1255
- Joined: Wed Feb 02, 2005 11:54 am
- Location: United States of America
The trouble was that he was talking in philosophy, but they were listening in gibberish.
-- Terry Pratchett
Naveen.
Anything that won't sell, I don't want to invent. Its sale is proof of utility, and utility is success.
Author: Thomas A. Edison 1847-1931, American Inventor, Entrepreneur, Founder of GE
Author: Thomas A. Edison 1847-1931, American Inventor, Entrepreneur, Founder of GE
Issue may be resolved
This issue may be resolved by a patch for the 7.x releases.
Reference Composite patch for ecase 63861 and 70788
This patch is specific to relase level 7.0 7.5.1 7.5.2 etc. You must receive binary opbject code from IBM for each release.
Reference Composite patch for ecase 63861 and 70788
This patch is specific to relase level 7.0 7.5.1 7.5.2 etc. You must receive binary opbject code from IBM for each release.
I guess that would depend on your definition of the problem:
A) Jobs aborting
B) Finite nature of computer hardware
Following is a portion of the patch readme:
--------------------------------------------------------------------
Implemented Solution:
---------------------
It transpired that the problem of orphaned stagerun processes only occurred when jobs were stopped or aborted. The DSD_RUN code has been modified to explicitly terminate stages if they do not voluntarily terminate. This occurs after a predefined timeout period formerly 60 seconds and now configurable.
As this behaviour is a significant change in functionality from the original
design it is under the control of environment variables.
New environment variables to control aspects of this patch
----------------------------------------------------------
DSForceTerminate - If unset or has a value of zero then the product behaves as at present with respect to the termination of processes. Setting
a value of 1 will bring into play the new forced termination code.
DSWaitShutdown - If unset or less than 60 the timeout will be set to 60 seconds this has been the default for most customers. Setting a higher figure will result in job termination waiting for the specified number of seconds before the run process terminates or the stages are forcibly shutdown (depends on the above environment variable).
DSWaitStartup - As for the shutdown variable however this controls the amount of time the startup procedure will wait to determine that a stage has started before it logs a failure.
--------------------------------------------------
A) Jobs aborting
B) Finite nature of computer hardware
Following is a portion of the patch readme:
--------------------------------------------------------------------
Implemented Solution:
---------------------
It transpired that the problem of orphaned stagerun processes only occurred when jobs were stopped or aborted. The DSD_RUN code has been modified to explicitly terminate stages if they do not voluntarily terminate. This occurs after a predefined timeout period formerly 60 seconds and now configurable.
As this behaviour is a significant change in functionality from the original
design it is under the control of environment variables.
New environment variables to control aspects of this patch
----------------------------------------------------------
DSForceTerminate - If unset or has a value of zero then the product behaves as at present with respect to the termination of processes. Setting
a value of 1 will bring into play the new forced termination code.
DSWaitShutdown - If unset or less than 60 the timeout will be set to 60 seconds this has been the default for most customers. Setting a higher figure will result in job termination waiting for the specified number of seconds before the run process terminates or the stages are forcibly shutdown (depends on the above environment variable).
DSWaitStartup - As for the shutdown variable however this controls the amount of time the startup procedure will wait to determine that a stage has started before it logs a failure.
--------------------------------------------------
-
- Participant
- Posts: 1
- Joined: Mon Mar 27, 2006 2:58 am
- Location: Singapore
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact: