Rollback Option in DataStage

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
mujeebrahman
Participant
Posts: 7
Joined: Tue Dec 11, 2007 12:48 am
Location: MUMBAI

Rollback Option in DataStage

Post by mujeebrahman »

I have a job in which I have to load two tables into the database. The two tables have parent child relatonship.I have a sequential file as input stage and then a transformer stage and finally I have to split the input record into two tables and load into the database as two different tables. I am using two DB2 API stages as target stages. The parent table has SO_ID as primary key and the child table has SO_ID and SO_LINE_NO as primary key. SO_ID is a foreign key in the child table.Now my requirement is to rollback whatever is loaded into the parent table if the child table is not loaded properly.

Can anyone tell me how can I rollback the data in parent table if the child table is not loaded properly?
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

What do you mean by "not loaded properly"? The highest level of rollback is done by using the database commit functionality - i.e. if the commit size is 0 then if the job fails then no records are written. In your case this doesn't sound like what you are looking into doing. DataStage has so such inherent functionality, but I would look into adding a date/time modified column to the table and use that to rollback records that don't belong. But the deletes would have to be coded in a job.
asitagrawal
Premium Member
Premium Member
Posts: 273
Joined: Wed Oct 18, 2006 12:20 pm
Location: Porto

Post by asitagrawal »

I agree with ArndW's suggestion...using TRANSACTION_SIZE of 0 is the best option for rollbacks... so if the job aborts due to any reason...then the data will not be committed.

Also, I would suggest to load the child table first..and only when it is successfully loaded, proceed to load the parent table .

But remember one thing here... while inserting if any duplicates are found, then a Warning is logged in datastage... In 7.5.1A the job did not abort.. so no rollback occurred when duplicates are found..creating a big mess !!
In 7.5.2 it was supposed to be working..but it did not work... (I wonder if it worked for any one else or not).. so please check this in DataStage 8...how the job behaves when it finds any duplicates while inserting...
Share to Learn, and Learn to Share.
WoMaWil
Participant
Posts: 482
Joined: Thu Mar 13, 2003 7:17 am
Location: Amsterdam

Post by WoMaWil »

Dear Brah-Man,

Transaction Mechanisms in DataBases was primarely made for single row operations of multiple tables. Within DataStage you can use them as described by Arnd and Asita.

One Problem with large volumes is that some time the rollback segments get full and fuller.

We made since many year good experience with a process_ID, which is an additional field in each table to fill. Each process get a new ID.

By this mean we are able to delete all loads. Sometime we verify, that we did a wrong load a day after the loading, so it is very easy to delete the relevant rows with the ID.

If you have UPDATE and DELETE you need 2 IDs and it works too.
Wolfgang Hürter
Amsterdam
mujeebrahman
Participant
Posts: 7
Joined: Tue Dec 11, 2007 12:48 am
Location: MUMBAI

Post by mujeebrahman »

Not loaded properly means sometimes some of the records will be loaded in child table and some of them are not. If a date field is there in the record, then because of incompatibilty, it is not loaded into the child table.
mujeebrahman
Participant
Posts: 7
Joined: Tue Dec 11, 2007 12:48 am
Location: MUMBAI

Post by mujeebrahman »

Not loaded properly means sometimes some of the records will be loaded in child table and some of them are not. If a date field is there in the record, then because of incompatibilty, it is not loaded into the child table.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Your job needs to abort in order to rollback any uncomitted records. And I'd suggest you use one target DB stage if at all possible.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Pre-load and preserve an image of the parent table in a hashed file. Use this image to prevent referential integrity constraints. Mark every row you load, as Wolfgang suggested or with the job run timestamp, so that they can be undone irrespective of your commit setting.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
mujeebrahman
Participant
Posts: 7
Joined: Tue Dec 11, 2007 12:48 am
Location: MUMBAI

Post by mujeebrahman »

Thanks to everyone for such a fast reply.


I have to use two target DB2 stages since I have to load two tables.

As sugeested by asita, the child table cannot be loaded before the parent table since the primary key in parent table is the foreign key in child table.

Besides transaction size, there is one more option in DB2 API stage called array size. I think the array size should also be considered.

Do anybody have any suggestion on array size?
WoMaWil
Participant
Posts: 482
Joined: Thu Mar 13, 2003 7:17 am
Location: Amsterdam

Post by WoMaWil »

If there would be an optimal "Array size" DataStage would use it. Array size is something which might make you job perform in a better time. You have to make some experiments for to find the optimal point for each Stage and Job.

Concerning a caution Transaction Management it should be "1".
Wolfgang Hürter
Amsterdam
asitagrawal
Premium Member
Premium Member
Posts: 273
Joined: Wed Oct 18, 2006 12:20 pm
Location: Porto

Post by asitagrawal »

"Array Size" is the number of rows to be processed at the same time... "Transaction size" is the number of rows after whicha COMMIT needs to be issued...

Ideally..Transaction Size a multiple of Array Size.
Share to Learn, and Learn to Share.
Post Reply