How to get performance while using UserdefinedUpdate stmt

Madhu1981 · Post by **Madhu1981** » Tue Aug 08, 2006 3:52 am

Hi,

I have a job like

Dataset ---> Copy Stage --> Db2Enterprise stage and my

Write Method is : Upsert.
Upsert mode is : Userdefined Update.
Configuration File: 4 Nodes.

Now, I am getting 1 million records from input and its taking 3 hours to update the records.

Is there any way to make better performance.

thanks in advance

ArndW · Post by **ArndW** » Tue Aug 08, 2006 4:23 am

Madhu, all timings are relative, so nobody here can tell you what to do or even if these values are good or bad.
What is your effective speed if you do only inserts? Is your database partitioned? Do you have multiple constraints?

madhukar · Post by **madhukar** » Tue Aug 08, 2006 4:36 am

Since it taking more time, you can try the follwoing option.

do the upsert process in datastage and do all records insert into the table.

pigpen · Post by **pigpen** » Tue Aug 08, 2006 7:32 am

I think of 2 trials, see if it can help.

1. Increase the commit count and the DB transaction log buffer.
2. Separate the upsert into 1 update and 1 load process.

bcarlson · Post by **bcarlson** » Wed Aug 09, 2006 5:00 pm

Another possibility is to take advantage of DB2's inherent parallel processing capabilities - write all of your data to a work table, then use normal DB2 sql to handle your inserts and updates (or use DB2 V8's merge syntax to do an upsert). That will allow you to take advantage of indexes that DataStage can't use.

Here's what we often do:

1) In your DataStage job, left outer join your new data (left) to the existing table's keys (right). The table read pulls keys only and an indicator field, no data. Example:

select account_nr, cast('UPDATE' as char(6) as ld_action
from schema.tablename

2) Use the indicator field to show whether a reocrd is an 'INSERT' or 'UPDATE'. Because you used a left outer join, the ld_action will be NULL if the record is new, and 'UPDATE' if not. You can use a modify stage to change NULL to 'INSERT'. Now your output record layout is input_data + ld_action.

3) Write all records to a work table. Add an index to your keys, and to the ld_action field. You could consider using the 'load' mode instead of 'write' to avoid logging.

4) Use SQL to insert where ld_action = 'INSERT' (or NULL if you didn't modify it before), and update where ld_action = 'UPDATE'. Make sure your work table and target table are indexed the same, and that you do a runstats on your work table after it is loaded and indexed.

1 million records or 50 million records, you should be able to get the time down to minutes instead of hours.

Hope this helps,

Brad.

bcarlson · Post by **bcarlson** » Wed Aug 09, 2006 5:04 pm

By the way, don't get me wrong. I am a staunch supporter of DataStage and use it where ever possible. However, you should always use the best tool for the job at hand. If DataStage does it best, then go for it. However, if your database can do it faster, then use it.

Brad.

DSXchange

How to get performance while using UserdefinedUpdate stmt

How to get performance while using UserdefinedUpdate stmt

Re: How to get performance while using UserdefinedUpdate stm