Page 1 of 2

Sequencer Aborts Due to Error Code = -14

Posted: Mon Jan 16, 2006 6:07 pm
by oacvb
Hi,

I have a sequencer which calls 3 jobs, during the call of one job the sequencer got aborted and find the following log message. Please help me in resolving this.


72 STARTED Tue Jan 17 03:19:08 2006
Starting Job DS_MNP_MNP.20060117_031852. (...)
73 INFO Tue Jan 17 03:19:08 2006
Environment variable settings: (...)
74 INFO Tue Jan 17 03:19:09 2006
DS_MNP_MNP.20060117_031852.JobControl (@Coordinator): Starting new run o
f checkpointed Sequence job
75 BATCH Tue Jan 17 03:19:10 2006
DS_MNP_MNP -> (DS_MNP_MNP_XFM.20060117_031852): Job run requested (...)
76 INFO Tue Jan 17 03:19:10 2006
DS_MNP_MNP.20060117_031852.JobControl (DSRunJob): Waiting for job DS_MNP
_MNP_XFM.20060117_031852 to start
77 WARNING Tue Jan 17 03:20:12 2006
DS_MNP_MNP.20060117_031852.JobControl (@JOB_DS_MNP_MMP): Controller prob
lem: Error calling DSRunJob(DS_MNP_MNP_XFM.20060117_031852), code=-14 (...)

Regards,
O.A.C.

Re: Sequencer Aborts Due to Error Code = -14

Posted: Mon Jan 16, 2006 9:15 pm
by loveojha2
oacvb wrote:Hi,
DS_MNP_MNP.20060117_031852.JobControl (DSRunJob): Waiting for job DS_MNP
_MNP_XFM.20060117_031852 to start
77 WARNING Tue Jan 17 03:20:12 2006
DS_MNP_MNP.20060117_031852.JobControl (@JOB_DS_MNP_MMP): Controller prob
lem: Error calling DSRunJob(DS_MNP_MNP_XFM.20060117_031852), code=-14 (...)

Regards,
O.A.C.
Looks like your machine is too busy and is not able to start a new child process(Error Code -14)

Posted: Mon Jan 16, 2006 11:03 pm
by kumar_s
HI,

AS loveojha2 mentioned it is the one of the possible reasone that, the server might be fully loaded.
But also let us know the status of the job DS_MNP_MNP_XFM at the time of abort.

-Kumar

Posted: Tue Jan 17, 2006 1:18 am
by oacvb
Hi,

Child job (DS_MNP_MNP_XFM ) is not called at all.

Regards,
O.A.C.

Posted: Tue Jan 17, 2006 2:09 am
by kumar_s
Hi,

What happens if you rerun?
Is it persistent or occuring randomly?

-Kumar

Posted: Tue Jan 17, 2006 8:14 am
by chulett
An error 14 is a timeout. It is a symptom of an overloaded machine - i.e. too many processes / jobs attempting to run at the same time. This could be because the hardware is underpowered for the load being put on it or perhaps because some tunables are set too low.

You haven't given us any clues about your hardware other than it's UNIX. What are your system specs? Have you done any uvconfig tuning or is everything pretty much 'out of the box'?

Posted: Tue Jan 17, 2006 3:32 pm
by ray.wurlod
Everyone knows that DataStage server machines have infinite capacity and it's perfectly OK to run thousands of jobs simulaneously! :twisted:

Posted: Thu Mar 16, 2006 1:56 pm
by mauherga
Hi ray,

Is it true??
seriously?

Posted: Thu Mar 16, 2006 2:05 pm
by I_Server_Whale

The trouble was that he was talking in philosophy, but they were listening in gibberish.
-- Terry Pratchett
:lol:

Naveen.

Issue may be resolved

Posted: Wed Apr 18, 2007 11:04 am
by rfwoods
This issue may be resolved by a patch for the 7.x releases.
Reference Composite patch for ecase 63861 and 70788

This patch is specific to relase level 7.0 7.5.1 7.5.2 etc. You must receive binary opbject code from IBM for each release.

Posted: Wed Apr 18, 2007 11:19 am
by chulett
All the 'patch' could do is increase the timeout, I would think - is that what you mean here? It won't address the fundamental problem.

Posted: Wed Apr 18, 2007 11:33 am
by rfwoods
I guess that would depend on your definition of the problem:
A) Jobs aborting
B) Finite nature of computer hardware

Following is a portion of the patch readme:

--------------------------------------------------------------------
Implemented Solution:
---------------------

It transpired that the problem of orphaned stagerun processes only occurred when jobs were stopped or aborted. The DSD_RUN code has been modified to explicitly terminate stages if they do not voluntarily terminate. This occurs after a predefined timeout period formerly 60 seconds and now configurable.

As this behaviour is a significant change in functionality from the original
design it is under the control of environment variables.

New environment variables to control aspects of this patch
----------------------------------------------------------
DSForceTerminate - If unset or has a value of zero then the product behaves as at present with respect to the termination of processes. Setting
a value of 1 will bring into play the new forced termination code.

DSWaitShutdown - If unset or less than 60 the timeout will be set to 60 seconds this has been the default for most customers. Setting a higher figure will result in job termination waiting for the specified number of seconds before the run process terminates or the stages are forcibly shutdown (depends on the above environment variable).

DSWaitStartup - As for the shutdown variable however this controls the amount of time the startup procedure will wait to determine that a stage has started before it logs a failure.

--------------------------------------------------

Posted: Wed Apr 18, 2007 11:49 am
by chulett
Well now... that's a little different - they are addressing the problem with an acknowledged 'significant change' to How Things Work in this patch it would seem.

Posted: Fri Apr 20, 2007 2:24 am
by venkatachalamsri
Hi,

I am new kid to datastage administration & facing the same issue. How to find out the patch installed in DS unix box.

With Regards.
Venkat

Posted: Fri Apr 20, 2007 4:12 am
by ray.wurlod
Ask whoever installed it. Look in the system log - the documentation of all changes to the system. Properly done it should be a book chained to the console.