Unexpected termination by Unix signal 9(SIGKILL)

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
dh_Madhu
Premium Member
Premium Member
Posts: 65
Joined: Sat Apr 23, 2005 3:19 am
Location: Stirling, Scotland

Unexpected termination by Unix signal 9(SIGKILL)

Post by dh_Madhu »

Hi,
Recently while running a sequencer, which has a loop stage and quite a few command execute stages in between the loop I encountered a fatal error which goes like this
1. Unexpected termination by Unix signal 9(SIGKILL)
followed by
2. Failure during execution of operator logic.

What could be the reason for this error?
This error surfaced when a command execute stage with the following command was requested.
/bin/date +"%r" which returns the time (10:00:32 AM).
Has this got something to do with Unix connection?
Should I use another CE stage and issue a "exit" command and try to please the unix box :roll:
The error often occurs during the end of the loop(say after 25 iterations normally)...

Thanks in Advance.
Regards,
Madhu Dharmapuri
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Someone killed the process with kill -9
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
dh_Madhu
Premium Member
Premium Member
Posts: 65
Joined: Sat Apr 23, 2005 3:19 am
Location: Stirling, Scotland

Post by dh_Madhu »

Ray,
Given the fact that this does not arise consistently,
Could there be a flaw in the job design?
If yes! then would the execution of a series of CE stages bring unix down?
Sometime back other members of the team too were struck by this error(again occasionally) when jobs were run using a thrid party tool.
Regards,
Madhu Dharmapuri
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

No, the SIGKILL (kill -9) is something that is either explicitly issued by a person or perhaps by a monitoring tool when it thinks Something Is Wrong.

DataStage itself won't do this. Perhaps this "third party tool" you've mentioned? Or perhaps by an Admin who is not all that DataStage aware? I've heard stories where exactly that has happened to people. :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

Sometimes the admin people get worried about the DS jobs and kill the process.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

They need to be educated out of this practice using minimum necessary force.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
dh_Madhu
Premium Member
Premium Member
Posts: 65
Joined: Sat Apr 23, 2005 3:19 am
Location: Stirling, Scotland

Post by dh_Madhu »

Iam quite sure that there is no intervention by any individual.
Again these jobs run perfectly after they are compiled and re-run.
Anyway, we have the Ascential team coming down to solve this problem and I shall come back with what they had said.
Thanks a lot guys.
Regards,
Madhu Dharmapuri
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Someone DID issue a kill -9 command, Madhu, but probably won't admit to it. This is the only way that this particular signal can be generated. No-one creates applications or scripts to use this command, it's just too dangerous.

For example, take a look at $DSHOME/sample/ds.rc (the DataStage startup/shutdown script). It uses kill -15 which is a far more graceful approach to signalling a process to shut down.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Rajesh_kr82
Participant
Posts: 24
Joined: Sat Oct 15, 2005 1:09 pm

Post by Rajesh_kr82 »

I am also facing the same problem of SIGKILL. My jobs are getting aborted even with small amount of data. My job is a multiple instance job. I am running one instance at a time and evertime i supply different input files. Out of 50-60 times job fails like once or twice. Was any one able to successfully remove this problem of SIGKILL.

My buffers are set to default values:
APT_BUFFERING_POLICY Automatic buffering
APT_BUFFER_DISK_WRITE_INCREMENT 1048576
APT_BUFFER_FREE_RUN 0.5
APT_BUFFER_MAXIMUM_MEMORY 3145728
APT_BUFFER_MAXIMUM_TIMEOUT 1

Could anyone find the patch required for AIX?

Madhu,

What solution did you guys get from the Ascential persons??
Regards,
Rajesh
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

It's beginning to look like the software (whether DataStage or other) itself might be killing processes with SIGKILL signal, which is never recommended practice. Do you run from a job sequence that includes a Terminator activity? Is the machine heavily loaded when this occurs and, if so, does the operating system have any load-balancing software installed?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply