Page 1 of 1

PX Job not terminating

Posted: Fri Aug 19, 2005 5:23 am
by Salegueule
Looks like a job started yesterday at 7:55 Pm yesterday do not terminate. I am seeing that it have been running all night now for almost 11 hrs. I have first try to stop it from Director. It does not do anything, it is still running. Although we seems to have kill all related process yesterday night in Unix it is still running somewhere in the background.

I have run the following from the Administrator command line: CLEAR.FILE &PH& and it is still running.

Although it might sounds a bit drastic, do you think that a DS restart or server reboot could help at this point?


Thanks

Posted: Fri Aug 19, 2005 6:34 am
by ArndW
Hell Saleguele,

Doing a CLEAR.FILE &PH& will just clear the existing logfiles, it won't stop a job from continuing to run.
It might be that your job is actually finished running (with an abort) but the Director doesn't know that. If you do a "ps -ef " grep {your-user}" do you see any processes that use the ...orch... programs?

Short of stopping the machine I think you could bring down Datastage (but don't try to re-start it until all processes using DS have stopped).

One of the options in the Director is to "clear status file" which you might be able to do. This will make the job no longer show up as "running", but it is a mistake to set this if the actual processes are still going - it is easy to chew up a lot of CPU. BTW, if you check the CPU and IO usage on your machine do you see that the system is active or inactive?

Posted: Mon Aug 22, 2005 9:20 am
by rajpatel
ArndW's way to clear status file is worderful and it will come to compiled mode once it is done.

we had same situation as you are having and we follow that route and setttled everything.

--raj

Posted: Mon Aug 22, 2005 4:31 pm
by ray.wurlod
Curiously, perhaps, a re-boot will not help in this circumstance.

A DataStage records its status periodically in a table in the Repository. So do some of its active stages and "resources". This information is read by the Director client.

When people report that a job is not terminating, it usually means that the status table (RT_STATUSnn for job number nn) has not been updated with a status of "Finished", for example.

This may be for a number of reasons, the most common being that one of the processes was killed with a SIGKILL signal, or a child process failed to notify its parent successfully. In PX jobs the child processes are player processes and their parents are section leader processes, which in turn are child processes of the conductor process.