Rollback Option in DataStage
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 7
- Joined: Tue Dec 11, 2007 12:48 am
- Location: MUMBAI
Rollback Option in DataStage
I have a job in which I have to load two tables into the database. The two tables have parent child relatonship.I have a sequential file as input stage and then a transformer stage and finally I have to split the input record into two tables and load into the database as two different tables. I am using two DB2 API stages as target stages. The parent table has SO_ID as primary key and the child table has SO_ID and SO_LINE_NO as primary key. SO_ID is a foreign key in the child table.Now my requirement is to rollback whatever is loaded into the parent table if the child table is not loaded properly.
Can anyone tell me how can I rollback the data in parent table if the child table is not loaded properly?
Can anyone tell me how can I rollback the data in parent table if the child table is not loaded properly?
What do you mean by "not loaded properly"? The highest level of rollback is done by using the database commit functionality - i.e. if the commit size is 0 then if the job fails then no records are written. In your case this doesn't sound like what you are looking into doing. DataStage has so such inherent functionality, but I would look into adding a date/time modified column to the table and use that to rollback records that don't belong. But the deletes would have to be coded in a job.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Premium Member
- Posts: 273
- Joined: Wed Oct 18, 2006 12:20 pm
- Location: Porto
I agree with ArndW's suggestion...using TRANSACTION_SIZE of 0 is the best option for rollbacks... so if the job aborts due to any reason...then the data will not be committed.
Also, I would suggest to load the child table first..and only when it is successfully loaded, proceed to load the parent table .
But remember one thing here... while inserting if any duplicates are found, then a Warning is logged in datastage... In 7.5.1A the job did not abort.. so no rollback occurred when duplicates are found..creating a big mess !!
In 7.5.2 it was supposed to be working..but it did not work... (I wonder if it worked for any one else or not).. so please check this in DataStage 8...how the job behaves when it finds any duplicates while inserting...
Also, I would suggest to load the child table first..and only when it is successfully loaded, proceed to load the parent table .
But remember one thing here... while inserting if any duplicates are found, then a Warning is logged in datastage... In 7.5.1A the job did not abort.. so no rollback occurred when duplicates are found..creating a big mess !!
In 7.5.2 it was supposed to be working..but it did not work... (I wonder if it worked for any one else or not).. so please check this in DataStage 8...how the job behaves when it finds any duplicates while inserting...
Share to Learn, and Learn to Share.
Dear Brah-Man,
Transaction Mechanisms in DataBases was primarely made for single row operations of multiple tables. Within DataStage you can use them as described by Arnd and Asita.
One Problem with large volumes is that some time the rollback segments get full and fuller.
We made since many year good experience with a process_ID, which is an additional field in each table to fill. Each process get a new ID.
By this mean we are able to delete all loads. Sometime we verify, that we did a wrong load a day after the loading, so it is very easy to delete the relevant rows with the ID.
If you have UPDATE and DELETE you need 2 IDs and it works too.
Transaction Mechanisms in DataBases was primarely made for single row operations of multiple tables. Within DataStage you can use them as described by Arnd and Asita.
One Problem with large volumes is that some time the rollback segments get full and fuller.
We made since many year good experience with a process_ID, which is an additional field in each table to fill. Each process get a new ID.
By this mean we are able to delete all loads. Sometime we verify, that we did a wrong load a day after the loading, so it is very easy to delete the relevant rows with the ID.
If you have UPDATE and DELETE you need 2 IDs and it works too.
Wolfgang Hürter
Amsterdam
Amsterdam
-
- Participant
- Posts: 7
- Joined: Tue Dec 11, 2007 12:48 am
- Location: MUMBAI
-
- Participant
- Posts: 7
- Joined: Tue Dec 11, 2007 12:48 am
- Location: MUMBAI
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Pre-load and preserve an image of the parent table in a hashed file. Use this image to prevent referential integrity constraints. Mark every row you load, as Wolfgang suggested or with the job run timestamp, so that they can be undone irrespective of your commit setting.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 7
- Joined: Tue Dec 11, 2007 12:48 am
- Location: MUMBAI
Thanks to everyone for such a fast reply.
I have to use two target DB2 stages since I have to load two tables.
As sugeested by asita, the child table cannot be loaded before the parent table since the primary key in parent table is the foreign key in child table.
Besides transaction size, there is one more option in DB2 API stage called array size. I think the array size should also be considered.
Do anybody have any suggestion on array size?
I have to use two target DB2 stages since I have to load two tables.
As sugeested by asita, the child table cannot be loaded before the parent table since the primary key in parent table is the foreign key in child table.
Besides transaction size, there is one more option in DB2 API stage called array size. I think the array size should also be considered.
Do anybody have any suggestion on array size?
If there would be an optimal "Array size" DataStage would use it. Array size is something which might make you job perform in a better time. You have to make some experiments for to find the optimal point for each Stage and Job.
Concerning a caution Transaction Management it should be "1".
Concerning a caution Transaction Management it should be "1".
Wolfgang Hürter
Amsterdam
Amsterdam
-
- Premium Member
- Posts: 273
- Joined: Wed Oct 18, 2006 12:20 pm
- Location: Porto