Page 1 of 1

'Unit Of Work' commits?

Posted: Mon May 24, 2010 8:41 am
by mczern
I'm working with a client that is relatively new to DataStage and has mainly dealt with COBOL on the mainframe. They are enquiring about the ability of DataStage to mimic the concept of 'Unit of Work'....

Is there a way to package a DataStage job to commit related records to multiple tables?
Is there a way to force upserts of a table to rollback if one of the table upserts, within that 'Unit of Work,' fails?
And finally, if both of the above are possible, implement the concept of continuing to process 'Units Of Work' after a failure (ie: If during the 3rd of 5 units of work, a failure occurs, restart the process from the 3rd 'Unit of Work' instead of from the very beginning of the cycle)?

This smells like one or more sequences of jobs, but I'm really not sure. I've never had to implement this concept before within DataStage.

Posted: Mon May 24, 2010 10:14 am
by kduke
Some of this you can do with server. I think version 8 has transaction groupings in PX as well. The hard part is starting from the point of failure. You need to figure that out on your own. Transaction groups will allow you to group records across links and commit them all as one transaction. It is rather crude or simple but does work.

Posted: Mon May 24, 2010 2:10 pm
by ray.wurlod
The Distributed Transaction stage (version 8.1.2 and later) allows for transaction grouping, two-phase commit and lots of related things.

(I have to check which of this functionality is currently available and which is in the next release. But I am not currently able to do so.)

Posted: Mon May 24, 2010 6:18 pm
by chulett
The ODBC stage can be used to enforce 'Unit of Work' commits, I've done it in the past (with Ernie's help) in webservice jobs. You have to manage the 'commit' and 'rollback' statements at the proper times / boundaries, but it can be done.

As Kim noted, restarts are completely on you. Typically, it is a matter of leaving breadcrumbs, knowing where the last (failed) run ended and skipping incoming data (usually via a parameter driven constraint on row counts) until you get past that point. And, of course, this really is only possible with a static source.

Posted: Tue May 25, 2010 5:46 am
by eostic
Yes...it can be done...it's not "easy", but it is definitely do-able. A Server Transformer can send multiple links into an ODBC target.... one link (say) for an insert to Parent table, another link for (say) the insert to a child.... depending on what is happening in each row upstream, there could be many children for each parent..... What happens if the 72nd child row gets rolled back because of a problem in the rdbms? ....or if your own logic decides that the 72nd child row is no good? You use custom SQL with two additional links....one that is a "rollback" link and another that is a "commit" link. You decide explicitly when, how and where to fire those links, via a complex set of constraints that checks your own transaction validity rules, and also rigorously checks the return codes that come back from each of the parent and child links.
This is very explicit control....I have found that it is most consistent with the ODBC stage, due to flucutations in return codes from the rdbms. It is a complex technique, but it works.

Ernie

Posted: Wed May 26, 2010 9:58 am
by Aquilis
In Server canvas, 'transaction grouping' comes into picture when the ORACLE stage(ODBC/OCI) receives multiple links from the same transformer stage. The problem with the PX job is that multiple processes are started, and there is no way for one process to rollback another process.
For PX jobs you will need to look at the Unit of Work operator. For PX Jobs
I raised an IBM ticket on this long back and IBM has came up to me saying it will included into 8.1 & next versions. now I am using 8.1 but it's not having this functionality yet.

Hopefully needs to veify with 8.1.x.

Posted: Wed May 26, 2010 10:12 am
by chulett
Transaction Grouping is another beast - similar, perhaps, but not the same as what the OP needs.