Getting better upsert performance

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
dsisbank
Premium Member
Premium Member
Posts: 50
Joined: Fri Aug 11, 2006 4:53 am

Getting better upsert performance

Post by dsisbank »

Hi,
I use write medhod UPSERT with upsert mode Update and Insert,but performance is very bad.When i use Insert and Update mode,performance is very very good but data is doubled if the data is already loaded before.
How can i solve this?
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Why is the data doubling? That should not happen. Try trimming the keys and then sending it in. What about at the database end, are the keys a surrogate number or a character key?

Your seeing an increase in performance for 'insert then update' because there are more inserts then updates. The opposite will slow it down for the same reason. The best approach, split the inserts and updates.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Re: Getting better upsert performance

Post by chulett »

dsisbank wrote:I use write medhod UPSERT with upsert mode Update and Insert,but performance is very bad.When i use Insert and Update mode,performance is very very good but data is doubled if the data is already loaded before.
You need unique indexes of some sort, PK or otherwise, to make anything like this work. It only does the second action if the first one fails after all.

If Update then Insert is slow, it's because it is doing full table scans for each record. If Insert then Update is fast but 'doubles' everything, that's because it is just inserting, which is allowed because you've got nothing to enforce uniqueness. I'll bet you if you run it again, it 'triples'... etc.

Add a unique index over the 'key' fields then pick the methodology that will get you the most 'hits' on the first try. Or as noted, split your actions.
-craig

"You can never have too many knives" -- Logan Nine Fingers
keshav0307
Premium Member
Premium Member
Posts: 783
Joined: Mon Jan 16, 2006 10:17 pm
Location: Sydney, Australia

Post by keshav0307 »

you must have unique index on the upsert key columns ( all the column you have used in the where clause of update statement).
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

INSERTing is usually very fast, UPDATEs are notoriously slow. The best methods for loading are to partition align your data with the database, avoid global indexes on table structures, pre-determine inserts from updates, pure insert load your inserts, insert your updates into a runtime table and use parallel DML to mass update your data within the database environment.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Post Reply