performace issue with updating a huge number of records

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
jenny_wang
Participant
Posts: 26
Joined: Mon Nov 19, 2007 2:55 am
Location: Hangzhou

performace issue with updating a huge number of records

Post by jenny_wang »

I create a job for update records in a table. the job is design like this:

DataSet----->Join(Inner)<----Oracle(Read)
:o :o :o :o |
:o :o :o :o |
:o :o :o :o |
Oracle(Update)

if the dataset and oracle(read) don't have so many records, the update speed is ok. but if the Oracle(Read) output about 200 thousand records and the DataSet also output about 200 thousand, the update will last for more than 20 hours. but in actual environment, the oracle(Read) will output more than 10 million records.

could someone give some suggestions to improve the performance?
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

What you perceive as a performance issue is really a design issue.

If possible, avoid updates in a bulk data processing application.

Updates are suitable in record-at-a-time OLTP applications with low volumes, but not suitable for very many ETL applications. An update is a very expensive database operation.

Shift to a bulk data processing paradigm. Seek a design that utilizes bulk-load utilities.

Mike
jenny_wang
Participant
Posts: 26
Joined: Mon Nov 19, 2007 2:55 am
Location: Hangzhou

Post by jenny_wang »

Thanks,Mike
could you give me some suggestions about the design?
the oracle stage only do updating without inserting.
so the performance is bad.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

If you can't avoid updates, try using partitioned tables or parallel processing to the database. Change the upsert in DataStage to ensure that you always try the update first, then the insert. Talk to your DBA about what options are available in the database to tune the job.
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

You have to change your design. It doesn't matter how efficiently you tune a single update... doing 10 million of them (are you really updating all of those?) will take forever. Join all of your source Oracle table with your dataset... then completely replace all of the data with a load/replace function. At that volume, talk to your DBA about partitioning. The bulk load can benefit from that as well.

Mike
jenny_wang
Participant
Posts: 26
Joined: Mon Nov 19, 2007 2:55 am
Location: Hangzhou

Post by jenny_wang »

hi,mike
actually most of the records should be updated,not all of them.
Post Reply