Transaction Grouping

Manny 1 · Post by **Manny 1** » Wed Feb 11, 2004 7:11 am

I am using Transaction Grouping to group 2 input streams to the same OCI Stage. One stream is an update to set a flag in a table and the second is an insert of a new record in the same table. What I want to happen is that when either stream fails, both transactions need to roll back.
However, when I force the job to abort during the insert and look at the underlying table, the previous update has been committed.
I have done the following;
Enabled transaction grouping
Stream Name: Update OnSkip: Continue OnFail: Rollback
Stream Name: Insert OnSkip: Continue OnFail: Rollback
Is there another option I neeed to set up?

chulett · Post by **chulett** » Wed Feb 11, 2004 8:27 am

What's your Transaction Size? I believe you need to keep it a Zero in this particular case. If that's how you've got it set, then...

raju_chvr · Post by **raju_chvr** » Wed Feb 11, 2004 8:31 am

But Craig,

Are u asking about 'Transaction Size' next to 'Array Size' or 'Rows Per Transaction' in the last tab ?

If 'Transaction Size' next to 'Array Size' : I thought 'Transaction Size' is DS old way of handling data and it is still left their for backward compatibilty.

Am I wrong ?

chulett · Post by **chulett** » Wed Feb 11, 2004 8:36 am

It was a generic reference, so I should have ditched the caps. You are correct that the field actually called 'Transaction Size' next to the 'Array Size' is indeed there for backwards compatability only and is ignored. The 'Rows Per Transaction' tab is the only one that applies anymore.

That's what happens when you do stuff from memory.

raju_chvr · Post by **raju_chvr** » Wed Feb 11, 2004 8:54 am

But again here in OCI stage when you select 'Enable Transaction Grouping' the 'Rows Per Transaction' is grayed out and the value '1' stays their though you change it to '0' before it grayed out.

Houston we have a problem here !!

EmmaP · Post by **EmmaP** » Wed Feb 11, 2004 10:30 am

Thanks for the responses. Was using a colleagues logon when I originally posted whilst getting my login sorted out.
I don't want to commit after each record, I want to commit after the entire process has completed. Therefore if either of the streams fails, rollback everything. You are right, array size is always 0 for transaction grouping.

chulett · Post by **chulett** » Wed Feb 11, 2004 11:46 am

Now that I've re-read the Help... since 'Rows Per Transaction' is always forced to 1 when 'Transaction Grouping' is enabled then all you can rollback is the current 'transaction' - in your case the pair of rows you working with at any given time. Doesn't sound like that is what you want.

EmmaP · Post by **EmmaP** » Thu Feb 12, 2004 3:27 am

Assumming we can put up with committing after every pair, does anybody know how/why this insert is committing prior to the update failing?

ogmios · Post by **ogmios** » Thu Feb 12, 2004 4:38 am

EmmaP wrote:Assumming we can put up with committing after every pair, does anybody know how/why this insert is committing prior to the update failing?

We've been there before, the easiest is to rethink your strategy altogether else peformance will really suck.

Ogmios

EmmaP · Post by **EmmaP** » Thu Feb 12, 2004 6:44 am

Thanks Ogmios, what strategy did you ultimately implement when you encountered this problem?

chulett · Post by **chulett** » Thu Feb 12, 2004 8:17 am

I assume Omigos is talking about the practice of splitting your Inserts and Updates into seperate jobs / streams to maximize the throughput of both, perhaps even bulk-loading your Inserts. Please correct me if I'm wrong.

However, since this thread is about Transaction Grouping, that may not be an option for you. For the record, how important is this concept to what you are doing or need to do? Or is it just something you are trying out for fun?

EmmaP · Post by **EmmaP** » Thu Feb 12, 2004 9:03 am

You are correct, bulk loading or splitting into separate stages is not an option for us (unless we can somehow ensure Transaction Grouping). That is, if we perform an update then an insert, yet the insert fails, the previous update must not take effect. We do not have failures often, however if we do, we want the table to remain in the same state as before the job commenced. Performance is also an issue as we need to run these jobs in the optimal amount of time.

raju_chvr · Post by **raju_chvr** » Thu Feb 12, 2004 9:36 am

Can you use a temporary table for this and once this job is finished successfully then load the data from this temp table to the original table.

In this way you will not be altering the data of the original table. this shld be possible as you are updating the records you are inserting into.

The transaction grouping definitely commits one record as far as I know and you can't change the number less than one and yes there is a performance issue here.

All this shld be fixed if you use a temp table.

kcbland · Post by **kcbland** » Thu Feb 12, 2004 10:26 am

EmmaP wrote:You are correct, bulk loading or splitting into separate stages is not an option for us (unless we can somehow ensure Transaction Grouping). That is, if we perform an update then an insert, yet the insert fails, the previous update must not take effect. We do not have failures often, however if we do, we want the table to remain in the same state as before the job commenced. Performance is also an issue as we need to run these jobs in the optimal amount of time.

Performance and transaction grouping are cross-purposes. Loading multiple tables simultaneously with referential integrity and parent-child relationships is not performant. You must commit the transaction across all grouped tables as a set. You do not have a whole session commit capability, only the set level.

If you are doing a real-time solution using Datastage Server (unknown since you hijacked this post, friendly advice: I suggest you start new ones next time so we get your system/release information) then maybe you should look at the new RTI release.

Performant batch data warehousing is easy, high performance loading is done using bulk loaders. DML versus bulk loading is 100's times slower. You have eliminated parallel DML as an option, as well as performance options (don't know your database, can't speak directly to point). Furthermore, you should be validation all constraints, be it data characteristics (type, length, precision, etc) as well as RI in your ETL solution. Any load failures should be 100% restartable, as in out of space, simply extend the tablespaces and restart the loads.

I have avoided this post and others because the issue is not the tool. Its the design concept that is the problem. Everything descends from this. Unless you're doing real-time, tried-and-true batch data warehouse concepts should apply. If you're struggling with RI, foreign key constraints, and lack of ability to use bulk loaders, you need to read up on Kimball and every post at TDWI's website on do's and don'ts in ETL architecture.

chulett · Post by **chulett** » Thu Feb 12, 2004 10:56 am

kcbland wrote:If you are doing a real-time solution using Datastage Server (unknown since you hijacked this post, friendly advice: I suggest you start new ones next time so we get your system/release information) then maybe you should look at the new RTI release.

Actually Ken, you must have missed this a few posts up:

EmmaP wrote:Thanks for the responses. Was using a colleagues logon when I originally posted whilst getting my login sorted out.

So, you can call off the Air Marshalls, there's no hijacking here.