Oracle Fatal Error not Fatal

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
bmsq
Premium Member
Premium Member
Posts: 32
Joined: Mon Oct 30, 2006 9:19 pm

Oracle Fatal Error not Fatal

Post by bmsq »

Hi guys,

I've just discovered some issues with several of our jobs which load into an Oracle Database. These issues seem to have been slowly introduced over a long period of time but they have gone unnoticed because some strange DataStage behavior.

What is happening is during our load (Oracle bulk load, Direct=false Parallel=true, Disable Contraints=true, Index Mode=Rebuild) we are loading data which breaks constraints and the Oracle enterprise stage is throwing a Fatal error when trying to re-enable the constraints. This is clearly an error in our jobs, however, errors like this are going unnoticed because the job is still reporting that it finished successfully despite raising several fatal errors.

I assume that this is not correct behavior and the job should actually be aborting. Is there a setting that I have missed? If this can't be fixed using configuration, I'm thinking I might have to create an after job running routine to check for any fatal errors and then throw my own from within DS basic.

Any thoughts or suggestions?

Thanks,
Barry
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

My thought is that you are using server job thinking. In server jobs a "fatal" error causes the job to abort. Not in parallel jobs, however. An operator (player process) may abort, but its section leader and conductor process do not. This enables control to be maintained, and clean up to be performed.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
bmsq
Premium Member
Premium Member
Posts: 32
Joined: Mon Oct 30, 2006 9:19 pm

Post by bmsq »

Ray,

I can understand that it affects the individual players and the conductor process continues to allow cleanup. But the conductor process can't clean up broken constraints, instead we end up with a job reporting RUNOK but the database has one or more constraints left in a disabled state.

Surely, if this was designed behavior, there would be some way to configure this to report when something like this goes wrong?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

If you think the product should work differently submit an enhancement request. There's a forum here on DSXchange for doing that.

You could, of course, detect the condition in a job sequence and trigger some remedial processing, email notification, or whatever - even abort the job sequence. Personally, I prefer never to abort anything.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
bmsq
Premium Member
Premium Member
Posts: 32
Joined: Mon Oct 30, 2006 9:19 pm

Post by bmsq »

Thanks again Ray,

Just on your last point, what is the "normal" way that you would detect this type of error? I can only think of using DSGetLogSummary() with DSJ.LOGFATAL within the DS basic API to dectect this situation.

Cheers,
Barry
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I'd search for a *.bad or *.log*bad file generated by sqlldr.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
csrazdan
Participant
Posts: 127
Joined: Wed May 12, 2004 6:03 pm
Location: Chicago IL

Post by csrazdan »

With the following configuration:

(Oracle bulk load, Direct=false Parallel=true, Disable Contraints=true, Index Mode=Rebuild)

In my experiece I have also faced the same problem. The load is successful, Rebuild index failed but the job reported success. It was only when the downstream jobs try to access this table we notice that table is not accessible because indexes are in bad state.

To handle this situation, you will have to ensure Rebuild index process does not fail for any reason or atleast of any reason because of your job. One of the main reason for failure is Rebuild index is duplicate records in your source data. You should include a remove duplicate stage in your job to take care of duplicates.

Hope it helps.....
Assume everything I say or do is positive
Post Reply