At a very high level I'm finding that uncommitted rows can still be committed even if the job aborts. When the problem was first brought to me I said "no way" but we confirmed it and then I was able to replicate it. The tests I've run are for a very specific error - ORA-01400 Cannot insert null - but my gut tells me that this would happen for any error.
When Oracle pushes an error back up from OCI into a Server job's log, it actually logs three warnings for each occurrence. For me, those were:
Code: Select all
W) TestOCICommit..TEST_OCI: The value of the row is: TEST_ROW = NULL RUN_DATE = 2009-08-13 09:51:42
W) TestOCICommit..TEST_OCI: ORA-01400: cannot insert NULL into ("DM_DSS_STAGE"."TEST_OCI"."TEST_ROW")
W) TestOCICommit..TEST_OCI: DBMS.CODE=ORA-01400
Code: Select all
W) TestOCICommit..TEST_OCI.Insert_New: DSP.Close Error -100 in $DSP.Close.
The issue seems to center around how a "partially filled" array is handled when you only send that one array of data to the database. Using an Array Size of 1 means you'll never see this issue. If the count of rows you are sending to Oracle are greater than the Array Size you'll never see this issue. But if the first array is also the last array of rows you send and it is only partially full, that is when I can see this issue. My specific problem reported was for loading 49 rows where 1 (the last row) was bad with an Array Size of 128. So the first array was the last array and it wasn't full. Now, it may be very unlikely that your record count is smaller than the Array Size but it can happen for small volumes, hence my concern.
The baffling part is the fact that this is further complicated by the Warning Limit you use when you run the job - i.e. how many Warnings will cause a Fatal error to be logged and abort the job. Set too high (or gawdforbid to Unlimited) you won't see this problem as the warnings don't abort the job. What's extremely perplexing to me is that if I set the Warning Limit to 1 or 2 everything works as expected and I get an appropriate rollback. Here's an example where it was set to 2:
Code: Select all
W) TestOCICommit..TEST_OCI: The value of the row is: TEST_ROW = NULL RUN_DATE = 2009-08-12 17:02:07
W) TestOCICommit..TEST_OCI: ORA-01400: cannot insert NULL into ("DM_DSS_STAGE"."TEST_OCI"."TEST_ROW")
F) Job Aborted after 2 errors logged.
W) Attempting to Cleanup after ABORT raised in stage TestOCICommit..TEST_OCI
This is not what happens when the Warning Limit is 3 with no other changes made to the job or settings. I get a very similar string of messages logged:
Code: Select all
W) TestOCICommit..TEST_OCI: The value of the row is: TEST_ROW = NULL RUN_DATE = 2009-08-12 17:14:32
W) TestOCICommit..TEST_OCI: ORA-01400: cannot insert NULL into ("DM_DSS_STAGE"."TEST_OCI"."TEST_ROW")
W) TestOCICommit..TEST_OCI: DBMS.CODE=ORA-01400
F) Job Aborted after 3 errors logged.
W) Attempting to Cleanup after ABORT raised in stage TestOCICommit..TEST_OCI
I am at a complete loss to explain why this rolls everything back when the Warning Limit is 1 or 2 but the records commit if the Warning Limit is set to 3. The job aborts in either case but somehow allowing that one more wafer-thin warning to be logged lets the commit happen regardless. And to recap this is only the case when the Array Size is larger than the incoming record count so you've only sent one incomplete array to OCI.
I don't know if this is a DataStage bug in their OCI code or something I can pin on Oracle, perhaps a bug in the client is all that comes to mind. I wish I could get them to upgrade the client software so I can see if that changes the behaviour but I don't really see that happening. So I post here in hopes of someone having been down this path before who can shed some light on the issue.
Thanks for reading all the way to the end!
(you *did* read the whole post, didn't you?)