Page 1 of 1

Upsert Vs Insert/Update

Posted: Tue Nov 20, 2007 5:15 pm
by pravin1581
Hi All,

We are developing jobs where we need to insert/update the DB2 tables.Can you please tell me in terms of performance whether an upsert method is useful or insert and update in two different stages.

Posted: Tue Nov 20, 2007 5:44 pm
by kcbland
Upsert is all DML (update else insert), whereas separating inserts from updates allow you to use enterprise/bulk loading at least for the inserts and then just DML for updates.

If you're really into performance, then enterprise/bulk loading updates into an ETL work table and command line executing a parallel DML statement for the updates put you at the fastest loading scheme.

Posted: Tue Nov 20, 2007 8:28 pm
by rleishman
Even if you don't want to use the bulk loading capabilities of an Enterprise stage, there are still advantages to be had from separating your inserts and updates.

If you use an UPSERT with an array size of (say) 500 with Insert set to try first, then DS will attempt to insert all 500 rows in one statement. If just ONE of them fails, then the entire array fails, and DS has to pick it apart and run the inserts that would succeed followed by updates.

ie. The cost of the original INSERT and the ROLLBACK are wasted time.

If you know which rows already exist, you can improve performance by bypassing this failed INSERT.

Posted: Wed Nov 21, 2007 9:29 am
by pravin1581
kcbland wrote:Upsert is all DML (update else insert), whereas separating inserts from updates allow you to use enterprise/bulk loading at least for the inserts and then just DML for updates.

If you're really into performance, then enterprise/bulk loading updates into an ETL work table and command line executing a parallel DML statement for the updates put you at the fastest loading scheme.
A simple insert gives me bulk loading facility.According to your suggestion it is better if we keep it in two separate stages.

Posted: Wed Nov 21, 2007 9:32 am
by pravin1581
rleishman wrote:Even if you don't want to use the bulk loading capabilities of an Enterprise stage, there are still advantages to be had from separating your inserts and updates.

If you use an UPSERT with an array size of (say) 500 with Insert set to try first, then DS will attempt to insert all 500 rows in one statement. If just ONE of them fails, then the entire array fails, and DS has to pick it apart and run the inserts that would succeed followed by updates.

ie. The cost of the original INSERT and the ROLLBACK are wasted time.

If you know which rows already exist, you can improve performance by bypassing this failed INSERT.
Can you please elaborate your answer , sorry I didn't get you.

Posted: Wed Nov 21, 2007 2:33 pm
by kcbland
If your dataset has rows that have to be handled as inserts and other rows that have to be handled as updates, you cannot BULK INSERT. You must separate the updates and use DML.

I assume you're trying to load as fast as possible. So, bulk loading inserts is the fastest method for inserts.

For updates, you can have a DS process read rows from a file and stream them to the database with a DML cursor , or, you can do a faster approach. Bulk loading the updates into a work table is faster than streaming them into a cursor. Parallel DML updating from the work table to the target table is faster than single thread DML updating from a cursor.

Posted: Wed Nov 21, 2007 3:51 pm
by rleishman
pravin1581 wrote: Can you please elaborate your answer , sorry I didn't get you.
There's more on upserts in thisthread

Posted: Wed Nov 21, 2007 5:30 pm
by kcbland
I just added that thread to my Favorites. That was great information.

Posted: Mon Nov 26, 2007 2:13 pm
by pravin1581
kcbland wrote:I just added that thread to my Favorites. That was great information.
Can anyone help me out on this .

Posted: Mon Nov 26, 2007 2:24 pm
by ray.wurlod
They already have. This thread and the one referred to have excellent, best practice advice.

Separate your inserts from your updates.