I create a job for update records in a table. the job is design like this:
DataSet----->Join(Inner)<----Oracle(Read)
|
|
|
Oracle(Update)
if the dataset and oracle(read) don't have so many records, the update speed is ok. but if the Oracle(Read) output about 200 thousand records and the DataSet also output about 200 thousand, the update will last for more than 20 hours. but in actual environment, the oracle(Read) will output more than 10 million records.
could someone give some suggestions to improve the performance?
performace issue with updating a huge number of records
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 26
- Joined: Mon Nov 19, 2007 2:55 am
- Location: Hangzhou
What you perceive as a performance issue is really a design issue.
If possible, avoid updates in a bulk data processing application.
Updates are suitable in record-at-a-time OLTP applications with low volumes, but not suitable for very many ETL applications. An update is a very expensive database operation.
Shift to a bulk data processing paradigm. Seek a design that utilizes bulk-load utilities.
Mike
If possible, avoid updates in a bulk data processing application.
Updates are suitable in record-at-a-time OLTP applications with low volumes, but not suitable for very many ETL applications. An update is a very expensive database operation.
Shift to a bulk data processing paradigm. Seek a design that utilizes bulk-load utilities.
Mike
-
- Participant
- Posts: 26
- Joined: Mon Nov 19, 2007 2:55 am
- Location: Hangzhou
If you can't avoid updates, try using partitioned tables or parallel processing to the database. Change the upsert in DataStage to ensure that you always try the update first, then the insert. Talk to your DBA about what options are available in the database to tune the job.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
You have to change your design. It doesn't matter how efficiently you tune a single update... doing 10 million of them (are you really updating all of those?) will take forever. Join all of your source Oracle table with your dataset... then completely replace all of the data with a load/replace function. At that volume, talk to your DBA about partitioning. The bulk load can benefit from that as well.
Mike
Mike
-
- Participant
- Posts: 26
- Joined: Mon Nov 19, 2007 2:55 am
- Location: Hangzhou