Transaction Management Handling

chulett · Post by **chulett** » Fri Dec 21, 2007 10:20 am

You can't use 'transaction grouping' if you want an 'all or nothing' style load. It forces the transaction size to 1 so that each 'unit of work' is either commited or rolled back together.

If you want all or nothing, uncheck the grouping option and use a transaction size of 0 for all links.

turash · Post by **turash** » Fri Dec 21, 2007 10:32 am

chulett wrote:You can't use 'transaction grouping' if you want an 'all or nothing' style load. It forces the transaction size to 1 so that each 'unit of work' is either commited or rolled back together.

If you want all or nothing, uncheck the grouping option and use a transaction size of 0 for all links.

I want all or nothing for all the three tables as a group.If any thing goes wrong in the any of the table all the tables should rollback.

No of tables to transfer in final production job will be 25.

chulett · Post by **chulett** » Fri Dec 21, 2007 10:40 am

As noted, you'll need to not use transaction grouping but rather a transaction size of zero. And then ensure that if there are any errors the job aborts, which will force the rollback.

Various ways to do that - run with a very low warning limit threshold, use a reject link set to abort after 1 record goes down it, call DSLogFatal... all kinds of ways. The important part is to ensure it always aborts at the first sign of trouble.

throbinson · Post by **throbinson** » Fri Dec 21, 2007 10:51 am

Is it the case that a single row gets inserted into Parent tables1/2 and 5 child records get written to table 3? If so, how about changing the table1/table2 inserts to Upserts so that they do not fail as the 2-5 repetitive records get written to table 3. As long as the Upsert to tables1/2 are identical then there's no harm except very bad performance.

turash · Post by **turash** » Wed Dec 26, 2007 10:08 am

chulett wrote:As noted, you'll need to not use transaction grouping but rather a transaction size of zero. And then ensure that if there are any errors the job aborts, which will force the rollback.

Various ways to do that - run with a very low warning limit threshold, use a reject link set to abort after 1 record goes down it, call DSLogFatal... all kinds of ways. The important part is to ensure it always aborts at the first sign of trouble.

To prevent

Attempting to insert duplicate records into the tables which have only 1 record and resulting in an exception scenario

Table 1 (ODBCTpolkey) --> 1
Table 2 (ODBCTpolicy) --> 1
Table 3 (ODBCTpolded) --> 5

I am using Hash file to give unique rows for table 1 and table 2 and table 3
but somehow after any db error and low warning level on datastage. Transaction rollback is not happening.

chulett · Post by **chulett** » Wed Dec 26, 2007 10:28 am

Again, 'transaction rollback' can only happen under the following circumstances:

1) You have not yet committed the records, so 'Transaction Size' is important. Typically, that would mean using a setting of zero.

2) The job aborts. Period. It must abort in order to rollback all uncommitted work.

The only exception to #2 is when using 'Transaction Grouping' with single records across multiple links. There each unit of work can be rolled back individually - and that only because each previous unit of work has already been committed.

Not sure how many different ways you want us to say the same thing.

turash · Post by **turash** » Wed Dec 26, 2007 12:09 pm

chulett wrote:Again, 'transaction rollback' can only happen under the following circumstances:

1) You have not yet committed the records, so 'Transaction Size' is important. Typically, that would mean using a setting of zero.

2) The job aborts. Period. It must abort in order to rollback all uncommitted work.

The only exception to #2 is when using 'Transaction Grouping' with single records across multiple links. There each unit of work can be rolled back individually - and that only because each previous unit of work has already been committed.

Not sure how many different ways you want us to say the same thing.

Thanks for the Help. I got what you are trying to say

WoMaWil · Post by **WoMaWil** » Thu Dec 27, 2007 4:30 am

Transaction handling with commit and rollback is a good and nessary feature of all databases. It was and is intended for normal database usage as to insert one order at a time which has several tables.

You can use this feature also within ETL-processes, but for that purpose it is in my oppinion a certain mis-use of this feature. With ETL you are loading mostly only one table with multiple rows.

I have made best expierence with a process_ID-Field in each table, where I put in a special number for the whole process, so that I don't need to have any attention on transaction handling. If the process was wrong (mostly I find it out hours after the last commit) I am able to delete all rows with this special process_id out of that table and it is very easy to handle.

One process_id is okay for insert. If you have update and delete as well you need to use 2 IDs.

eostic · Post by **eostic** » Thu Dec 27, 2007 8:43 am

The goal described can be achieved by taking advantage of how Server puts all the links into the same connection, but as noted above, not by specifically using the Transaction Grouping capability.
It's not for the faint-of-heart, but using the ODBC Stage as a target, and having all links come from a single Transformer, I've successfully created situations with parent and child tables where I need to rollback a parent and all prior children if the "nth" child fails for some reason.
This is done by having two specific COMMIT and ROLLBACK links in addition to the links for each of the tables. Use Custom SQL and some dummy column on those links to "drive" them. Then test on the various Link variables (SQLSTATE, DBMSCODE, etc.) to determine which of those links should fire at a particular time.
I would agree with everyone else's comments here, and try to use simpler solutions, even various recovery models, if you can, because this technique is difficult and increases in complexity dramatically if you have more than just two tables. But sometimes you absolutely need this degree of control. I've found this to especially be true in more real-time scenarios using MQ or WISD.

Ernie

turash · Post by **turash** » Thu Dec 27, 2007 9:34 am

WoMaWil wrote:Transaction handling with commit and rollback is a good and nessary feature of all databases. It was and is intended for normal database usage as to insert one order at a time which has several tables.

You can use this feature also within ETL-processes, but for that purpose it is in my oppinion a certain mis-use of this feature. With ETL you are loading mostly only one table with multiple rows.

I have made best expierence with a process_ID-Field in each table, where I put in a special number for the whole process, so that I don't need to have any attention on transaction handling. If the process was wrong (mostly I find it out hours after the last commit) I am able to delete all rows with this special process_id out of that table and it is very easy to handle.

One process_id is okay for insert. If you have update and delete as well you need to use 2 IDs.

Final aim for the job is to load 25 tables form source to target. These tables are related to each other in one to one and one to many relationship. There is a concept of workunit to be transfered. Physical delete is not a good idea because target is also transactional database.

I got some grip on the transactional control. but the problem now is that out of 25 tables any tables has multiple rows. It tires to execute the job that many times.

probably similar to Cartesian product .

chulett · Post by **chulett** » Thu Dec 27, 2007 9:48 am

Why do you feel the need to load all 25 tables in one job?

Just because they are 'related' doesn't mean they all need to be loaded simultaneously. Most people would approach this in a more modular manner, using separate jobs, loading tables in RI order and tracking problems as you work your way down the food chain. Use hashed files as they are meant to be used - reference information each job can leverage to know what parent records exist and which don't. Use that to drive the work done in each child table. Lather, rinse, repeat.

Perhaps it would help if you provided sufficient detail around this comment so that it makes more sense: "There is a concept of workunit to be transferred." Can you explain what this concept means in your world, provide specific examples?

turash · Post by **turash** » Thu Dec 27, 2007 10:20 am

Workunit in this case is Insurance policy

We need to move this insurance policy transaction from one transactional database to target transactional database.

we are using staging database with similar structure as target. Transform the data from source load in to staging.

from staging load all the tables for that insurance policy to target transactional database.

chulett wrote:Why do you feel the need to load all 25 tables in one job?

Just because they are 'related' doesn't mean they all need to be loaded simultaneously. Most people would approach this in a more modular manner, using separate jobs, loading tables in RI order and tracking problems as you work your way down the food chain. Use hashed files as they are meant to be used - reference information each job can leverage to know what parent records exist and which don't. Use that to drive the work done in each child table. Lather, rinse, repeat.

Perhaps it would help if you provided sufficient detail around this comment so that it makes more sense: "There is a concept of workunit to be transferred." Can you explain what this concept means in your world, provide specific examples?