While writing in the DB2 Enterprise stage, if the job is aborted what is the flexibility of restarting it by deleting the written records in the previous unsuccessful run in the same job other than looking up the same table again in the same job before writing it or running one more job for solving this purpose.
Is there any feature in this stage to solve this problem
restartability in db2 enterprise stage
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 31
- Joined: Tue Jul 13, 2004 5:26 am
- Location: chennai
- Contact:
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Best practice dictates that every record loaded is marked with some form of identification of the load run, perhaps a batch ID, perhaps a timestamp.
If you do this, unwinding inserts becomes a simple constrained DELETE statement.
If you've stage a before-image of the table you're loading, unwinding becomes an only slightly more complex operation using rows from the before-image. This would the approach to take if you ever perform UPDATE or DELETE operations into DB2.
There's nothing in the ETL tool; what you're asking to do isn't strictly ETL. So you have to plan and to design your recovery strategy.
If you do this, unwinding inserts becomes a simple constrained DELETE statement.
If you've stage a before-image of the table you're loading, unwinding becomes an only slightly more complex operation using rows from the before-image. This would the approach to take if you ever perform UPDATE or DELETE operations into DB2.
There's nothing in the ETL tool; what you're asking to do isn't strictly ETL. So you have to plan and to design your recovery strategy.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Re: restartability in db2 enterprise stage
In DB2 stage you have options Update Commit Interval and Row Commit Interval. These are available if the write mode is Upsert and only Row Commit is available when write mode is write(insert only)lakshmipriya wrote:While writing in the DB2 Enterprise stage, if the job is aborted what is the flexibility of restarting it by deleting the written records in the previous unsuccessful run in the same job other than looking up the same table again in the same job before writing it or running one more job for solving this purpose.
Is there any feature in this stage to solve this problem
Row Commit Interval would commit after specified no of records were inserted and if job aborts it rolls back these records.
Happy DataStaging
-
- Participant
- Posts: 31
- Joined: Tue Jul 13, 2004 5:26 am
- Location: chennai
- Contact:
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
That's a question best posed to your DB2 DBA who must configure sufficient space to handle transactions of the size you desire. Or, more likely, to tell you to use smaller transactions.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
You face a larger issue with a partitioned database load. On high volumes you simply cannot issue a commit at the end, as the data is independently streaming to a node. PX bypasses the coordinator node because it has the ability to read the system catalog. It automatically partitions the data to that of the database and fires off independent processes to directly write each partition. You will not be able to "commit at the end" because of the independent nature of this activity.
What you will have to do is have an audit number in your target table that will identify each row to a given load cycle, like a batch number. To make a self-correcting and restartable load job, you'll have to query the database to find rows you've already loaded and take them out of the load set as you stream it.
What we just finished designing for a major customer of DataStage PX just a restartable template job. The job takes the load-ready data and cuts out the primary key and loads it into an ETL work table in the target and inner-joins it to the target table, extracting any keys already there into a filtering stream. The load data then streams against that filter and only lets thru rows that aren't in that filter stream. To optimize this process, we added a job parameter that would be set to the state of that jobs run to indicate if it's the first try or a retry run. In the case of a first try, no primary key rows get written to the ETL work table, thus causing that extra stream of logic to abbreviate immediately and let all rows pass thru efficiently. A lot of work, but necessary to achieve a high-performance restartable load template job.
What you will have to do is have an audit number in your target table that will identify each row to a given load cycle, like a batch number. To make a self-correcting and restartable load job, you'll have to query the database to find rows you've already loaded and take them out of the load set as you stream it.
What we just finished designing for a major customer of DataStage PX just a restartable template job. The job takes the load-ready data and cuts out the primary key and loads it into an ETL work table in the target and inner-joins it to the target table, extracting any keys already there into a filtering stream. The load data then streams against that filter and only lets thru rows that aren't in that filter stream. To optimize this process, we added a job parameter that would be set to the state of that jobs run to indicate if it's the first try or a retry run. In the case of a first try, no primary key rows get written to the ETL work table, thus causing that extra stream of logic to abbreviate immediately and let all rows pass thru efficiently. A lot of work, but necessary to achieve a high-performance restartable load template job.
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle