Page 1 of 1

Fatal warning but job does not abort

Posted: Tue Mar 10, 2009 4:44 am
by joesat
Hi,

I have a simple extract job which exacts from a DB2 table. Recently there were some issues on the DB2 side and hence the queries were not processed properly.

All the server jobs that ran extracts failed with the fatal warning 'No FCM buffers available'. The same warning also cropped up in the PX job. However, the job did not fail and completed successfully!

Could someone explain this behaviour and how do I get the job to fail when a fatal warning comes up? Thanks!

Posted: Tue Mar 10, 2009 6:25 am
by ArndW
The job should fail, unless the warning is such that it occurred and then was corrected automatically upon retry. Note that the message was a "warning" in your PX job. Was the result from the PX job correct or not? If not, then this is certainly a bug.

Posted: Tue Mar 10, 2009 2:55 pm
by jlock23
I have the EXACT same issue! We have several jobs where, in the log, there are either warnings or fatal errors and yet the job status is "Finished" (i.e. OK, i.e JobStatus=1).

This is NOT acceptable to us. Looking at the status screen in director, everything looks great. We have to individually open each of the logs to review to see if there are any fatal errors or warnings.

I hope you find out what is causing this because I have looked it up several times in the past without finding a solution.

Every time I bring up the problem, people focus on the error or warning and NOT on the fact that the log does not match the jobstatus.

For us, it typically happens in jobs with ODBC connections, but we've seen it in other jobs as well.

Posted: Tue Mar 10, 2009 4:06 pm
by ray.wurlod
This is one area where parallel jobs are different from server jobs. A different mindset is needed.

Parallel jobs can finish with a status of "Finished" even though there are stages that generate "fatal" errors. This is because the exit status is that of the conductor process.

You can detect (in an after-job subroutine or in a parent sequence) whether there were any warnings/errors in the log, and do something about it.

Posted: Tue Mar 10, 2009 4:08 pm
by ray.wurlod
This is one area where parallel jobs are different from server jobs. A different mindset is needed.

Parallel jobs can finish with a status of "Finished" even though there are stages that generate "fatal" errors. This is because the exit status is that of the conductor process.

You can detect (in an after-job subroutine or in a parent sequence) whether there were any warnings/errors in the log, and do something about it.

Posted: Tue Mar 10, 2009 4:29 pm
by Kryt0n
While not a solution, why don't you add a post-job sub-routine (or in sequencer if doing such) to scan the logs and raise an abort when warnings/errors are found?

Saves you having to manually scan logs...