DSRunJob returning fatal errors as non-fatal?
Posted: Fri Sep 07, 2007 12:06 pm
Folks;
Here's the skinny. We have a routine that runs every morning, which calls a parallel job. In that routine, here's the bit that actually is kicking off the job:
errStatus = DSRunJob(hJob, DSJ.RUNNORMAL)
errStatus = DSWaitForJob(hJob)
JobStatus = JobFailedCheck(DSGetJobInfo(hJob, DSJ.JOBSTATUS))
These three lines are exactly as-is - there is *NO* checking of those errStatus return codes. This is inherited code that works fine 99.999% of the time. Once in a blue moon (like Tuesday night) the following happens from what I can tell from the log file:
1) The DSRunJob is run
2) EXACTLY one minute later, the DSWaitForJob is called, which returns instantly.
3) The JobStatus is checked, finds a good job status, and the routine continues processing.
BUT - the job is never actually run. The DSWaitForJob finished because of course the job is in a finished status from the day before, and the JobStatus that is read is also that from the previous day's run.
Notice that the DSRunJob step took exactly one minute - the same length as the normal time-out limit. Except usually when that happens a fatal error is thrown. In this case, no such error occurred. We had fifteen jobs kick off at once, and six of them didn't work in this fashion - and all six have the exact same times in their log files. So it seems like a normal case of DSRunJob timing out - except for the lack of a fatal error as mentioned.
From the number of jobs running and the times invovled, it appears like this was a time-out error that wasn't flagged for some reason. But even if it was some other error (I can't tell because the return codes were not logged) I would think any error that causes the job not to be run should be a fatal error.
Has anyone else ever seen anything like this before?
Seems strange to me...
Thanks - Richard
Here's the skinny. We have a routine that runs every morning, which calls a parallel job. In that routine, here's the bit that actually is kicking off the job:
errStatus = DSRunJob(hJob, DSJ.RUNNORMAL)
errStatus = DSWaitForJob(hJob)
JobStatus = JobFailedCheck(DSGetJobInfo(hJob, DSJ.JOBSTATUS))
These three lines are exactly as-is - there is *NO* checking of those errStatus return codes. This is inherited code that works fine 99.999% of the time. Once in a blue moon (like Tuesday night) the following happens from what I can tell from the log file:
1) The DSRunJob is run
2) EXACTLY one minute later, the DSWaitForJob is called, which returns instantly.
3) The JobStatus is checked, finds a good job status, and the routine continues processing.
BUT - the job is never actually run. The DSWaitForJob finished because of course the job is in a finished status from the day before, and the JobStatus that is read is also that from the previous day's run.
Notice that the DSRunJob step took exactly one minute - the same length as the normal time-out limit. Except usually when that happens a fatal error is thrown. In this case, no such error occurred. We had fifteen jobs kick off at once, and six of them didn't work in this fashion - and all six have the exact same times in their log files. So it seems like a normal case of DSRunJob timing out - except for the lack of a fatal error as mentioned.
From the number of jobs running and the times invovled, it appears like this was a time-out error that wasn't flagged for some reason. But even if it was some other error (I can't tell because the return codes were not logged) I would think any error that causes the job not to be run should be a fatal error.
Has anyone else ever seen anything like this before?
Seems strange to me...
Thanks - Richard