Rollback Option in DataStage

mujeebrahman · Post by **mujeebrahman** » Wed Feb 13, 2008 7:01 am

I have a job in which I have to load two tables into the database. The two tables have parent child relatonship.I have a sequential file as input stage and then a transformer stage and finally I have to split the input record into two tables and load into the database as two different tables. I am using two DB2 API stages as target stages. The parent table has SO_ID as primary key and the child table has SO_ID and SO_LINE_NO as primary key. SO_ID is a foreign key in the child table.Now my requirement is to rollback whatever is loaded into the parent table if the child table is not loaded properly.

Can anyone tell me how can I rollback the data in parent table if the child table is not loaded properly?

ArndW · Post by **ArndW** » Wed Feb 13, 2008 7:06 am

What do you mean by "not loaded properly"? The highest level of rollback is done by using the database commit functionality - i.e. if the commit size is 0 then if the job fails then no records are written. In your case this doesn't sound like what you are looking into doing. DataStage has so such inherent functionality, but I would look into adding a date/time modified column to the table and use that to rollback records that don't belong. But the deletes would have to be coded in a job.

asitagrawal · Post by **asitagrawal** » Wed Feb 13, 2008 7:14 am

I agree with ArndW's suggestion...using TRANSACTION_SIZE of 0 is the best option for rollbacks... so if the job aborts due to any reason...then the data will not be committed.

Also, I would suggest to load the child table first..and only when it is successfully loaded, proceed to load the parent table .

But remember one thing here... while inserting if any duplicates are found, then a Warning is logged in datastage... In 7.5.1A the job did not abort.. so no rollback occurred when duplicates are found..creating a big mess !!
In 7.5.2 it was supposed to be working..but it did not work... (I wonder if it worked for any one else or not).. so please check this in DataStage 8...how the job behaves when it finds any duplicates while inserting...

WoMaWil · Post by **WoMaWil** » Wed Feb 13, 2008 7:42 am

Dear Brah-Man,

Transaction Mechanisms in DataBases was primarely made for single row operations of multiple tables. Within DataStage you can use them as described by Arnd and Asita.

One Problem with large volumes is that some time the rollback segments get full and fuller.

We made since many year good experience with a process_ID, which is an additional field in each table to fill. Each process get a new ID.

By this mean we are able to delete all loads. Sometime we verify, that we did a wrong load a day after the loading, so it is very easy to delete the relevant rows with the ID.

If you have UPDATE and DELETE you need 2 IDs and it works too.

mujeebrahman · Post by **mujeebrahman** » Wed Feb 13, 2008 7:57 am

Not loaded properly means sometimes some of the records will be loaded in child table and some of them are not. If a date field is there in the record, then because of incompatibilty, it is not loaded into the child table.

mujeebrahman · Post by **mujeebrahman** » Wed Feb 13, 2008 8:01 am

Not loaded properly means sometimes some of the records will be loaded in child table and some of them are not. If a date field is there in the record, then because of incompatibilty, it is not loaded into the child table.

chulett · Post by **chulett** » Wed Feb 13, 2008 8:03 am

Your job needs to abort in order to rollback any uncomitted records. And I'd suggest you use one target DB stage if at all possible.

ray.wurlod · Post by **ray.wurlod** » Wed Feb 13, 2008 3:22 pm

Pre-load and preserve an image of the parent table in a hashed file. Use this image to prevent referential integrity constraints. Mark every row you load, as Wolfgang suggested or with the job run timestamp, so that they can be undone irrespective of your commit setting.

mujeebrahman · Post by **mujeebrahman** » Wed Feb 13, 2008 10:58 pm

Thanks to everyone for such a fast reply.

I have to use two target DB2 stages since I have to load two tables.

As sugeested by asita, the child table cannot be loaded before the parent table since the primary key in parent table is the foreign key in child table.

Besides transaction size, there is one more option in DB2 API stage called array size. I think the array size should also be considered.

Do anybody have any suggestion on array size?

WoMaWil · Post by **WoMaWil** » Thu Feb 14, 2008 3:00 am

If there would be an optimal "Array size" DataStage would use it. Array size is something which might make you job perform in a better time. You have to make some experiments for to find the optimal point for each Stage and Job.

Concerning a caution Transaction Management it should be "1".

asitagrawal · Post by **asitagrawal** » Thu Feb 14, 2008 4:20 am

"Array Size" is the number of rows to be processed at the same time... "Transaction size" is the number of rows after whicha COMMIT needs to be issued...

Ideally..Transaction Size a multiple of Array Size.