performace issue with updating a huge number of records

jenny_wang · Post by **jenny_wang** » Wed Oct 15, 2008 2:12 am

I create a job for update records in a table. the job is design like this:

DataSet----->Join(Inner)<----Oracle(Read)

|

|
Oracle(Update)

if the dataset and oracle(read) don't have so many records, the update speed is ok. but if the Oracle(Read) output about 200 thousand records and the DataSet also output about 200 thousand, the update will last for more than 20 hours. but in actual environment, the oracle(Read) will output more than 10 million records.

could someone give some suggestions to improve the performance?

Mike · Post by **Mike** » Wed Oct 15, 2008 6:47 am

What you perceive as a performance issue is really a design issue.

If possible, avoid updates in a bulk data processing application.

Updates are suitable in record-at-a-time OLTP applications with low volumes, but not suitable for very many ETL applications. An update is a very expensive database operation.

Shift to a bulk data processing paradigm. Seek a design that utilizes bulk-load utilities.

Mike

jenny_wang · Post by **jenny_wang** » Thu Oct 16, 2008 7:49 pm

Thanks,Mike
could you give me some suggestions about the design?
the oracle stage only do updating without inserting.
so the performance is bad.

ArndW · Post by **ArndW** » Fri Oct 17, 2008 1:50 am

If you can't avoid updates, try using partitioned tables or parallel processing to the database. Change the upsert in DataStage to ensure that you always try the update first, then the insert. Talk to your DBA about what options are available in the database to tune the job.

Mike · Post by **Mike** » Fri Oct 17, 2008 7:15 am

You have to change your design. It doesn't matter how efficiently you tune a single update... doing 10 million of them (are you really updating all of those?) will take forever. Join all of your source Oracle table with your dataset... then completely replace all of the data with a load/replace function. At that volume, talk to your DBA about partitioning. The bulk load can benefit from that as well.

Mike

jenny_wang · Post by **jenny_wang** » Tue Oct 21, 2008 1:53 am

hi,mike
actually most of the records should be updated,not all of them.