Job Running Slow - Need Guidance to improve performance

srinagesh · Post by **srinagesh** » Wed Oct 12, 2005 4:49 am

Hi,

We have a job which loads data from a flat file (90,000 records a day)
into a table (42 columns, 16 BitMap indexes, 2 b-Tree indexes) using Update Else Insert logic.

We are having lookup with 16 tables to ensure referential integrity.

we have pre-created the hash files for these lookup tables and the Hashfile sizes are based on the Hsh.Exe calculator available along with Datastage CD.

The Target table has a array size of 2000 and Transaction size of 10000.

The load starts at 150 Rows/Sec (for the first 3000 rows) and after a while it drops down to 100 (till 8000 rows) .. down to 60 (between 20000 and 30000 rows) and ultimately to 15 rows per second. Ultimately it takes about 2 hrs to load 90,000 records.

I cannot find any reasons for the drop in the loading rates.

Could someone help me in tuning this job to get good performance

Regards
Nagesh

ArndW · Post by **ArndW** » Wed Oct 12, 2005 5:48 am

Nagesh,

change your job to make the output go to a sequential file with the path "/dev/null" and see what your performance numbers are and if they remain constant. If the speed slows down over time as it did before then you can eliminate that part of your job. If the speed remains constant then you have identified the culprit. What kind of a database table are you writing to? Is it local to the server? Does it have constraints/keys/triggers?

A badly hashed file used in a lookup should show uniformly bad performance and not change performance during the run of a job.

chulett · Post by **chulett** » Wed Oct 12, 2005 7:21 am

Any chance you are writing to any of these hashed files (or a hashed file) as well during the load? This is also a Prime Suspect when dealing with progressive slowdown in a job.

I_Server_Whale · Post by **I_Server_Whale** » Wed Oct 12, 2005 7:45 am

Hi Nagesh,

How many rows/sec are being read from the hashed files?? Any idea??

Are you pre-loading the data from the hashed files into memory? If not, I would suggest you do so. And also set the "Read Cache Buffer" sufficiently.

You can increase the "Read Cache Size" in the Administrator under "Tunables". The default size is 128MB for both Read/Write Cache.

Let me know if this improves the loading time.

Thanks,
Naveen.

srinagesh · Post by **srinagesh** » Wed Oct 12, 2005 7:46 am

The hash files are pretty static. There is no "write" into the hash files.

I was wondering if array size and transaction size settings have any effect ?

The point from where I am coming is that .. Array Size is 2000 and Transaction size is 10000. And after 3000 records or so,.. the performance drops for the first time.

Is there any optimal setting for the array size. (I am using Oracle Stage and am connecting to 9i database)

- Nagesh

ArndW · Post by **ArndW** » Wed Oct 12, 2005 8:15 am

The first issue is not your absolute speed, but that it changes over the run of the program; this points to some issue - once you get that cleared up you can approach your array sizes and commit frequencies.

What was the result of writing to /dev/null instead of the table?

ml · Post by ml » Wed Oct 12, 2005 8:21 am

srinagesh wrote:Is there any optimal setting for the array size. (I am using Oracle Stage and am connecting to 9i database)

At first no. You need to test and find the optimal setting.
Which performance has the job if you change the oracle stage to a seq file?
How many rows do you have in the oracle table? how many rows are updated when you run this job?

Think that the option "Update then Insert" statement is not the best for performance. Your table increase the number of rows thats affects your performance, also you have a big amout of rows uncommited.

First I recommend to try with "array size" 500 and "Transaction size" 1000, start with these values and then test increasing, also you should check the index for this table.

Onther options could be split the links, one for insert and another for update, or may be you can use ORA bulk or truncate/insert, .. but these options will depends on your posibilities.

good luck.

kcbland · Post by **kcbland** » Wed Oct 12, 2005 8:57 am

You really need to know how many inserts and how many updates are being processed. The suggestions to split inserts from updates and bulk load the inserts are excellent ones.

Your data could be trending 90% updates and 10% inserts, with the updates at the back of your source data. Your job could fly thru the inserts because the data started with them, then degrade as more updates occur. Without partitioning the data into inserts or updates, you really don't have a clue as to what's going on.

Furthermore, separating inserts from updates allow you to high-performance update the data. You can bulk load the updates into a work table and then perform a parallel enabled dml merge/update to get your updates done a whole lot faster.

As for all of your transformation/reference logic, it sounds like it's in a single job. If you were staging inserts and updates to file instead of straight into the database, you could run multiple instances of that transformation job, spreading the processing across as many instances as you have cpus. You would then greatly increase the throughtput and reduce the overall runtime as well as using more of the available machine power.

srinagesh · Post by **srinagesh** » Wed Oct 19, 2005 11:26 am

Sorry for the late reply. I was off sick and hence couldnt attend to my work.

I have tried to load data into /dev/null as suggested and amazingly it was very fast.

I think the database is the culprit here.

I have tweaked around with array sizes and transaction sizes, but still no great improvement in performance.

I have also tried to modify the job to have 2 stages 1. To handle New Inserts, 2. To insert all the update values into a work table.

This part is pretty fast. But when I try to update my Main Table (2973874 records) with the data in the Work Table, things are going for a toss.

The performance deteriorates once again.

any suggestions / Work arounds ?

-Nagesh

ArndW · Post by **ArndW** » Wed Oct 19, 2005 11:46 am

srinagesh,

you will need to talk to your DBA here and have him/her monitor the database while you are loading. They will certainly find something (since there is always something to find) that can improve you jobs's performance. The indices are going to be a great place to start; if you could split your job into one stage doing updates and another doing your inserts (with a bulk load) you will see improvements; especially if you can defer all those updates until the end of load.