Page 1 of 1

Unexpected termination by Unix signal 9(SIGKILL)

Posted: Mon Sep 12, 2005 11:00 pm
by dh_Madhu
Hi,
Recently while running a sequencer, which has a loop stage and quite a few command execute stages in between the loop I encountered a fatal error which goes like this
1. Unexpected termination by Unix signal 9(SIGKILL)
followed by
2. Failure during execution of operator logic.

What could be the reason for this error?
This error surfaced when a command execute stage with the following command was requested.
/bin/date +"%r" which returns the time (10:00:32 AM).
Has this got something to do with Unix connection?
Should I use another CE stage and issue a "exit" command and try to please the unix box :roll:
The error often occurs during the end of the loop(say after 25 iterations normally)...

Thanks in Advance.

Posted: Tue Sep 13, 2005 1:30 am
by ray.wurlod
Someone killed the process with kill -9

Posted: Tue Sep 13, 2005 4:50 am
by dh_Madhu
Ray,
Given the fact that this does not arise consistently,
Could there be a flaw in the job design?
If yes! then would the execution of a series of CE stages bring unix down?
Sometime back other members of the team too were struck by this error(again occasionally) when jobs were run using a thrid party tool.

Posted: Tue Sep 13, 2005 6:45 am
by chulett
No, the SIGKILL (kill -9) is something that is either explicitly issued by a person or perhaps by a monitoring tool when it thinks Something Is Wrong.

DataStage itself won't do this. Perhaps this "third party tool" you've mentioned? Or perhaps by an Admin who is not all that DataStage aware? I've heard stories where exactly that has happened to people. :?

Posted: Tue Sep 13, 2005 7:08 am
by Sainath.Srinivasan
Sometimes the admin people get worried about the DS jobs and kill the process.

Posted: Tue Sep 13, 2005 4:00 pm
by ray.wurlod
They need to be educated out of this practice using minimum necessary force.

Posted: Tue Sep 13, 2005 8:49 pm
by dh_Madhu
Iam quite sure that there is no intervention by any individual.
Again these jobs run perfectly after they are compiled and re-run.
Anyway, we have the Ascential team coming down to solve this problem and I shall come back with what they had said.
Thanks a lot guys.

Posted: Wed Sep 14, 2005 1:08 am
by ray.wurlod
Someone DID issue a kill -9 command, Madhu, but probably won't admit to it. This is the only way that this particular signal can be generated. No-one creates applications or scripts to use this command, it's just too dangerous.

For example, take a look at $DSHOME/sample/ds.rc (the DataStage startup/shutdown script). It uses kill -15 which is a far more graceful approach to signalling a process to shut down.

Posted: Thu Sep 07, 2006 1:10 pm
by Rajesh_kr82
I am also facing the same problem of SIGKILL. My jobs are getting aborted even with small amount of data. My job is a multiple instance job. I am running one instance at a time and evertime i supply different input files. Out of 50-60 times job fails like once or twice. Was any one able to successfully remove this problem of SIGKILL.

My buffers are set to default values:
APT_BUFFERING_POLICY Automatic buffering
APT_BUFFER_DISK_WRITE_INCREMENT 1048576
APT_BUFFER_FREE_RUN 0.5
APT_BUFFER_MAXIMUM_MEMORY 3145728
APT_BUFFER_MAXIMUM_TIMEOUT 1

Could anyone find the patch required for AIX?

Madhu,

What solution did you guys get from the Ascential persons??

Posted: Thu Sep 07, 2006 3:48 pm
by ray.wurlod
It's beginning to look like the software (whether DataStage or other) itself might be killing processes with SIGKILL signal, which is never recommended practice. Do you run from a job sequence that includes a Terminator activity? Is the machine heavily loaded when this occurs and, if so, does the operating system have any load-balancing software installed?