Page 1 of 1

Very slow updates vs. inserts

Posted: Tue Mar 29, 2005 2:57 pm
by RodBarnes
I am seeing VERY slow times when using an update vs. an insert (3 rows/second vs. 3000 rows/second). Is this normal for DataStage to exhibit this drastic of a difference?

I am relatively new to DataStage but have successfully created an ETL sequence that works very well using a type 2 (Kimball) model where each change in pertinent data creates a new record via inserts. No updates are done to the existing rows.

However, I needed to add a new element to the target and that required a way to initialize data in that new element outside of the normal ETL sequence. I did this by creating a new job that builds a hash of the keys and initial values, then uses an MSOLEDB stage to export the data to a sequential file. Another job then runs through this sequential file and feeds the output to an MSOLEDB update stage to update the new element in each row. When done, it regenerates the CRC hash file so that I don't get a huge set of inserts on the next run of the ETL. I am using an "Update existing records only" Input Action on the update stage.

What I found was it all worked but the update stage was incredibly slow. Probably there are some settings that will help me tune this?

Thanks for any ideas or suggestions.

Posted: Tue Mar 29, 2005 4:09 pm
by ray.wurlod
This is more an artifact of database behaviour rather than DataStage behaviour. A lot more checking takes place for an UPDATE than for an INSERT. You haven't encountered in noticeably in the past because you haven't been throwing the volume of rows at the database that DataStage is.

Posted: Tue Mar 29, 2005 4:35 pm
by roy
Hi,
search for existing posts on this subject.
there are several ways including:
1. rebuild new table and make a switch
2. load the values and keys to a temp table then run an update sql with join on the 2 tables.

p.s:
if DS was the cause for your slow performance in update you would not have got better performance in inserts.

IHTH,

Posted: Wed Mar 30, 2005 10:26 am
by RodBarnes
Thanks, ray.wurlod. I suspected that was the case but was hoping there was some tuning in DS I could use. :-)

roy, I did search but didn't seem to find anything that particularly fit my question. #1 might work but I would need to include the keys from the original table in the inserts into the new table. I'll give this some thought.

I am intrigued by your #2: Could you elaborate on this? I understand the part about creating a temp table. Are you saying then to do an update that joins the two tables, updating values from one to the other? I've never tried that before and want to be sure I understand what you are recommending.

Thanks.

Posted: Wed Mar 30, 2005 10:54 am
by chulett
Roy is referring to something that I've heard called an 'Upsert'. Specifically (in Oracle for example) you can load your data into a temp table and then use the MERGE command to get new records and updates from the temp table to the target table by defining the join and telling it what to do when it 'matches' and 'not matches'. It can be enormously faster than pushing updates down an OCI link if done right.

Not sure what the equivalent (if any) would be for you. :?

Posted: Wed Mar 30, 2005 11:28 am
by roy
Hi,
You got the idea, I think SQLServer also has the ability to update 1 table with a join to another one.
I just can't remember the syntax it was about 2 years ago.

ask your DBAs or check with the help.

IHTH,