Very slow updates vs. inserts

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
RodBarnes
Charter Member
Charter Member
Posts: 182
Joined: Fri Mar 18, 2005 2:10 pm

Very slow updates vs. inserts

Post by RodBarnes »

I am seeing VERY slow times when using an update vs. an insert (3 rows/second vs. 3000 rows/second). Is this normal for DataStage to exhibit this drastic of a difference?

I am relatively new to DataStage but have successfully created an ETL sequence that works very well using a type 2 (Kimball) model where each change in pertinent data creates a new record via inserts. No updates are done to the existing rows.

However, I needed to add a new element to the target and that required a way to initialize data in that new element outside of the normal ETL sequence. I did this by creating a new job that builds a hash of the keys and initial values, then uses an MSOLEDB stage to export the data to a sequential file. Another job then runs through this sequential file and feeds the output to an MSOLEDB update stage to update the new element in each row. When done, it regenerates the CRC hash file so that I don't get a huge set of inserts on the next run of the ETL. I am using an "Update existing records only" Input Action on the update stage.

What I found was it all worked but the update stage was incredibly slow. Probably there are some settings that will help me tune this?

Thanks for any ideas or suggestions.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

This is more an artifact of database behaviour rather than DataStage behaviour. A lot more checking takes place for an UPDATE than for an INSERT. You haven't encountered in noticeably in the past because you haven't been throwing the volume of rows at the database that DataStage is.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Hi,
search for existing posts on this subject.
there are several ways including:
1. rebuild new table and make a switch
2. load the values and keys to a temp table then run an update sql with join on the 2 tables.

p.s:
if DS was the cause for your slow performance in update you would not have got better performance in inserts.

IHTH,
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
RodBarnes
Charter Member
Charter Member
Posts: 182
Joined: Fri Mar 18, 2005 2:10 pm

Post by RodBarnes »

Thanks, ray.wurlod. I suspected that was the case but was hoping there was some tuning in DS I could use. :-)

roy, I did search but didn't seem to find anything that particularly fit my question. #1 might work but I would need to include the keys from the original table in the inserts into the new table. I'll give this some thought.

I am intrigued by your #2: Could you elaborate on this? I understand the part about creating a temp table. Are you saying then to do an update that joins the two tables, updating values from one to the other? I've never tried that before and want to be sure I understand what you are recommending.

Thanks.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Roy is referring to something that I've heard called an 'Upsert'. Specifically (in Oracle for example) you can load your data into a temp table and then use the MERGE command to get new records and updates from the temp table to the target table by defining the join and telling it what to do when it 'matches' and 'not matches'. It can be enormously faster than pushing updates down an OCI link if done right.

Not sure what the equivalent (if any) would be for you. :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Hi,
You got the idea, I think SQLServer also has the ability to update 1 table with a join to another one.
I just can't remember the syntax it was about 2 years ago.

ask your DBAs or check with the help.

IHTH,
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
Post Reply