Performance Issue

cosec · Post by **cosec** » Thu Jan 10, 2008 8:22 pm

Hello,

My DS job with the following settings took about 5 hrs for the extraction and Loading which also includes lookups and some calculations. Please advice what parameters I should change to get a better performance.

My Design is summarized as follows

A DB2 Stage : does a multiple join select from 10 different tables based on certain criteria. Prefetch set to 50. (For this particular day the number of applicable records is about 150000) - These 10 tables are growing day by day.

Goes through a transformer1 that does a lookup.

Goes through another transformer2 which does a lookup

Goes through a transfomer3 which does some calculations.

Loads in to a DB2 stage.(Array size = 1 Transaction Size = 1000)

chulett · Post by **chulett** » Thu Jan 10, 2008 10:45 pm

There's no magic parameter. You need to determine where the 'bottleneck' is. Break off the first stage from the job into a new one and write to a sequential file with an @FALSE constraint. How long does it take? Add another stage, move the constraint down. Lather, rinse, repeat. At what step do you hit your 5 hour mark?

Are your lookups hashed files or database stages? What kind of 'calculations' are your performing in the third transform? What update action are you using in the DB2 target stage and why are you using an Array Size of 1? What is the Array Size of your source stage?

ray.wurlod · Post by **ray.wurlod** » Thu Jan 10, 2008 11:11 pm

Look outside of DataStage too. You have (apparently) a ten-way join happening in DB2. Is this being properly assisted by indexes? Are there some rows that would never be needed that could be eliminated by other WHERE constraints, also assisted by indexes?

Make a little job that does nothing else than extract from DB2 (send to a text file, but with @FALSE as the Transformer constraint, so that rows are read but never written by the job). How long does that take?

cosec · Post by **cosec** » Fri Jan 11, 2008 3:24 am

Hello Thanks for your responses....

The lookups are all hash files.....There is no option to input the array size in the source stage. If you are referring to the target then I used 1 as an array size inorder to capture rejects.

during the monitoring tool I noticed that the extraction was done at a mere 9 rows per second.

The SQL costing was about 65000 timerons...
may be i need to get the DBA check on the indexes and spend some time fine tuning the query

ray.wurlod · Post by **ray.wurlod** » Fri Jan 11, 2008 8:25 am

Another reason that rows/sec is a useless metric is the case where the extraction query involves sorting. The clock starts running but the query returns no rows until the sorting is complete. Then rows are probably delivered at quite a fast rate but, because the clock has been running since the query was issued, the reported rows/sec can be quite dismal.

cosec · Post by **cosec** » Thu Jan 17, 2008 7:20 pm

Hello Ray,

The problem was due to the fact that there weren't any indexes created on the base tables besides the defaults.
We were suggested to use some indexes which may boost the performance by 80%.

Also on the Loading part as I mentioin earlier the array size is set to 1 and transaction size set to 1000. The reason I set the array size to 1 is so that rejects will be captured in a Seq file. Will I see an improvement if I change the array size to 100 and the transaction size to 10000. Also would this mean my reject will not be captured properly ??

ray.wurlod wrote:Another reason that rows/sec is a useless metric is the case where the extraction query involves sorting. The clock starts running but the query returns no rows until the sorting is complete. Then r ...

chulett · Post by **chulett** » Thu Jan 17, 2008 7:26 pm

cosec wrote:Will I see an improvement if I change the array size to 100 and the transaction size to 10000.

Probably. Change it and let us know.

cosec also wrote:Also would this mean my reject will not be captured properly ??

If you were using Oracle, the answer would be an emphatic YES with any array size greater than one. Your reject count would be correct but the actual record written to the reject file wouldn't be correct. Can't speak to if the DB2 stage has the same issue but I'd wager it does. Test it yourself and let us know.