Oracle Fatal Error not Fatal

bmsq · Post by **bmsq** » Thu May 10, 2007 6:57 pm

Hi guys,

I've just discovered some issues with several of our jobs which load into an Oracle Database. These issues seem to have been slowly introduced over a long period of time but they have gone unnoticed because some strange DataStage behavior.

What is happening is during our load (Oracle bulk load, Direct=false Parallel=true, Disable Contraints=true, Index Mode=Rebuild) we are loading data which breaks constraints and the Oracle enterprise stage is throwing a Fatal error when trying to re-enable the constraints. This is clearly an error in our jobs, however, errors like this are going unnoticed because the job is still reporting that it finished successfully despite raising several fatal errors.

I assume that this is not correct behavior and the job should actually be aborting. Is there a setting that I have missed? If this can't be fixed using configuration, I'm thinking I might have to create an after job running routine to check for any fatal errors and then throw my own from within DS basic.

Any thoughts or suggestions?

Thanks,
Barry

ray.wurlod · Post by **ray.wurlod** » Thu May 10, 2007 9:19 pm

My thought is that you are using server job thinking. In server jobs a "fatal" error causes the job to abort. Not in parallel jobs, however. An operator (player process) may abort, but its section leader and conductor process do not. This enables control to be maintained, and clean up to be performed.

bmsq · Post by **bmsq** » Thu May 10, 2007 11:14 pm

Ray,

I can understand that it affects the individual players and the conductor process continues to allow cleanup. But the conductor process can't clean up broken constraints, instead we end up with a job reporting RUNOK but the database has one or more constraints left in a disabled state.

Surely, if this was designed behavior, there would be some way to configure this to report when something like this goes wrong?

ray.wurlod · Post by **ray.wurlod** » Thu May 10, 2007 11:20 pm

If you think the product should work differently submit an enhancement request. There's a forum here on DSXchange for doing that.

You could, of course, detect the condition in a job sequence and trigger some remedial processing, email notification, or whatever - even abort the job sequence. Personally, I prefer never to abort anything.

bmsq · Post by **bmsq** » Thu May 10, 2007 11:28 pm

Thanks again Ray,

Just on your last point, what is the "normal" way that you would detect this type of error? I can only think of using DSGetLogSummary() with DSJ.LOGFATAL within the DS basic API to dectect this situation.

Cheers,
Barry

ray.wurlod · Post by **ray.wurlod** » Fri May 11, 2007 2:16 am

I'd search for a *.bad or *.log*bad file generated by sqlldr.

csrazdan · Post by **csrazdan** » Fri May 11, 2007 10:19 am

With the following configuration:

(Oracle bulk load, Direct=false Parallel=true, Disable Contraints=true, Index Mode=Rebuild)

In my experiece I have also faced the same problem. The load is successful, Rebuild index failed but the job reported success. It was only when the downstream jobs try to access this table we notice that table is not accessible because indexes are in bad state.

To handle this situation, you will have to ensure Rebuild index process does not fail for any reason or atleast of any reason because of your job. One of the main reason for failure is Rebuild index is duplicate records in your source data. You should include a remove duplicate stage in your job to take care of duplicates.

Hope it helps.....