Job Performance for simple server job ?

jtseltmann · Post by **jtseltmann** » Tue Dec 20, 2005 12:32 pm

I have a couple of quick questions for some of the experts. I have done a bunch of searchs and read a lot of posts this morning surrounding this idea. I know there are many parameters so I'll try to be as explicit as possible.

I have a basic server job that reads as follows:
Hash(type30)-->Trans1 (with one ref read from Hash type30) -->ODBC stage(SQL db)

I am seeing some serious performance issues (that get progressively worse as the jobs process). The rows per trans was running somewhere around 39/second! The job was reading in about 1.6 million recs from the hash file and doing only Inserts on the ODBC side. The rows per trans was set at 0 and the array size 1. Enable row buffer has not been activated for the job and all defaults for buffers are set at 128K.

I am not sure if enabling row buffers will help out our performance. I do have a before stage subroutine and after stage subroutine that use a COMMON variable. I read that the Row buffering may disrupt how this is functioning but I wasn't sure if my scenario would be affected or not.

Can anyone give me a few suggestions as to how I may tweak the parameters/options to improve the writes to ODBC/SQL?

Thank you to anyone who can give me some guidance or reference stuff to read up on.

ak77 · Post by **ak77** » Tue Dec 20, 2005 12:43 pm

Hi

Am not an expert but just trying to understand

ODBC stage is suppose to be slower
I am not sure why you are using a Hash file unless if you are doing a lookup
I would rather use a sequential stage

Again the before/after job subroutine wont be affected by row buffering

I may be wrong

I will let the experts help you better

Kishan

ArndW · Post by **ArndW** » Tue Dec 20, 2005 12:45 pm

First off, put a constraint on your transform that always equates to false (I usually use "1=2") and see what the performance of your job without writing to the database is. I am fairly certain that the speed will be in the thousands per second (even faster if you pre-loaded your reference hashed file to memory).

Most likely you will need to concentrate on your database write. The easiest way to do this is with the bulk loader - if you are doing pure inserts. If that is not possible and you need to stick with ODBC (SQL Server?) then check to see that there are no extraneous constraints or triggers on the table(s) and remove them if possible. Increase your array and transaction size in increments until you get optimal speed.

Can you make this job into a multiinstance one, each instance getting just a subset of the data - then run the instances in parallel. This will increase performance in some, but not all, cases.

Is your system comfigured optimally? Does the DB reside on the same machine as the DataStage engine?

jtseltmann · Post by **jtseltmann** » Tue Dec 20, 2005 1:07 pm

Thanks for the suggestions..I have already incorporated a few and gotten some much better results:

--Ref Read hash file is preloaded already because I don't write to it.
--Modified the Rows Per Trans to 1000 and array to 350 and it seems to be processing the bulk of the job now at about 800-900 rows/sec.

I am almost positive its the database write that is slowing things down. There are no triggers on the table being written to.

The DB being written to is not on the same server as the DS app.

Using row buffering may help out? And it won't have an effect on my before/after stage routines that use the common area?

Thanks again!
for all the help and suggestions!

jtseltmann · Post by **jtseltmann** » Tue Dec 20, 2005 1:12 pm

Just another note on testing now...the above changes I made resulted in about 900 rows/sec until about a million rows were processed. Now it is slowly declining. It is now running about 675 and slowly dropping...but still tons better than it was...

anyone know why this would occur or are there just too many variables?

thanks again!

kcbland · Post by **kcbland** » Tue Dec 20, 2005 1:40 pm

Too many variables. As the job is busy working on transforming rows, the database sees that it's attention is not 100% needed, and you loose momentum. Constant roundtrips means you are never dedicated to one task at a time. You'd probably find that spooling to a file gives outstanding performance, and then another job reading that file and just loading against the table is expedient as well. The cumulative time for those two jobs will be less than the time for the all-in-one job.

Break-down the job, tune each component, small and modular will allow you to see bottlenecks and improve. You'll also gain instantiation capabilities you currently don't have.

ArndW · Post by **ArndW** » Tue Dec 20, 2005 1:50 pm

As Ken has already stated, there are too many changes and variables.

Find out what your job will do by constraining the output to the DB or changing your target to a flat file. This will give you the max throughput that this job will do. Have you done this, and if so, what is the speed?

Now that you have a baseline to work with, you can start working on your target - but do it one thing at a time. Change your transaction size and see if it has an overall effect. Then your array size. Are you doing inserts, or upserts? If the latter, is it update then insert or the other way around - the order can make a huge difference.