Are there any special considerations when killing a job that is runnning on a grid?
What is the proper sequence of steps to use to kill a job running on a grid that seems to be hung? (i.e. does not respond to a stop request from Director)
How to kill a Grid job
Moderators: chulett, rschirm, roy
You have to determine the nature of the hang. Is it waiting on a resource that is not available? Do you have an older version of the Grid Enablement Toolkit that didn't properly test the return code of your grid submit command?
I would suggest obtaining the latest Toolkit.
If the hang is in the "Waiting to be released from Queue" then you have to kill the grid job. You could also dummy up an _end file in your GRID_JOB_DIR path to simulate a done job, but that might cause the tool to see a valid termination rather than an abort.
I typically kill the DSD.RUN process then ensure that the grid queue is clean of that entry.
First attempt is always abort via director tool.
I would suggest obtaining the latest Toolkit.
If the hang is in the "Waiting to be released from Queue" then you have to kill the grid job. You could also dummy up an _end file in your GRID_JOB_DIR path to simulate a done job, but that might cause the tool to see a valid termination rather than an abort.
I typically kill the DSD.RUN process then ensure that the grid queue is clean of that entry.
First attempt is always abort via director tool.