Job won't terminate

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
stan_taylor
Charter Member
Charter Member
Posts: 14
Joined: Tue Mar 04, 2003 3:27 pm

Job won't terminate

Post by stan_taylor »

I have a job which should have terminated but has not. It was a test job to wait a long time for a file to show up. It should have waited for three days, and in the log that is exactly what it said it would do. The three days are long up, however, and according to the log the stage has not completed. Furthermore, I have touched the file hoping that might clear things up, but no dice. I know we have brought the DataStage server down twice since this job was initiated, and (based on my very limited knowledge of DataStage) I do not see any processes which look like they might be related. I would like to get these jobs out of a 'Running' state so that I can at least look at them to see what the problem might be, but I would also like to know what might have caused this in the first place in order to avoid it in the future. Anyone have any ideas on how to recover, or how to prevent this in the future? Thanks!
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The "status" you are seeing in Director is not necessarily the actual status of the job. Rather, it is the most recently updated status of the job in its RT_STATUSxx table.
Open a monitor on the job to determine whether all active stages in the job have finished. I would expect that they have, though again they may not have been able to update THEIR records in the RT_STATUSxx table.
Lack of updating in the RT_STATUSxx table is usually the result of one of two things; abnormal termination of the Stage or Job processes or loss of a signal. (The job awaits a signal from its children, the stage processes.)
If, in Monitor view, all the stages show as Finished but, in Status view, the job shows as Running, then there's probably been a lost signal. If a process has been abnormally terminated (for example by a kill -9 signal, or an abort situation in processing), it does not get the opportunity to update its entry in the RT_STATUSxx table.
This situation is why the "Clear Status File" option exists in Director (it must be enabled from Administrator). Using this option removes all status information about the job, so that it appears to have been freshly compiled.
If you want to be really certain, the files in the &PH& directory contain the process IDs of job processes (files with DSD.RUN... names) and stage processes (files with DSD.StageRun... names). You can then verify that these processes no longer exist, for example with ps -ef in UNIX, or Task Manager in Windows.


Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518
stan_taylor
Charter Member
Charter Member
Posts: 14
Joined: Tue Mar 04, 2003 3:27 pm

Post by stan_taylor »

Ray - as usual, great response! Thanks so much for your help.
Post Reply