Performance Issue

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
cosec
Premium Member
Premium Member
Posts: 230
Joined: Tue May 08, 2007 8:10 pm

Performance Issue

Post by cosec »

Hello,


My DS job with the following settings took about 5 hrs for the extraction and Loading which also includes lookups and some calculations. Please advice what parameters I should change to get a better performance.

My Design is summarized as follows

A DB2 Stage : does a multiple join select from 10 different tables based on certain criteria. Prefetch set to 50. (For this particular day the number of applicable records is about 150000) - These 10 tables are growing day by day.

Goes through a transformer1 that does a lookup.

Goes through another transformer2 which does a lookup

Goes through a transfomer3 which does some calculations.

Loads in to a DB2 stage.(Array size = 1 Transaction Size = 1000)
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

There's no magic parameter. You need to determine where the 'bottleneck' is. Break off the first stage from the job into a new one and write to a sequential file with an @FALSE constraint. How long does it take? Add another stage, move the constraint down. Lather, rinse, repeat. At what step do you hit your 5 hour mark?

Are your lookups hashed files or database stages? What kind of 'calculations' are your performing in the third transform? What update action are you using in the DB2 target stage and why are you using an Array Size of 1? What is the Array Size of your source stage?
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Look outside of DataStage too. You have (apparently) a ten-way join happening in DB2. Is this being properly assisted by indexes? Are there some rows that would never be needed that could be eliminated by other WHERE constraints, also assisted by indexes?

Make a little job that does nothing else than extract from DB2 (send to a text file, but with @FALSE as the Transformer constraint, so that rows are read but never written by the job). How long does that take?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
cosec
Premium Member
Premium Member
Posts: 230
Joined: Tue May 08, 2007 8:10 pm

Post by cosec »

Hello Thanks for your responses....

The lookups are all hash files.....There is no option to input the array size in the source stage. If you are referring to the target then I used 1 as an array size inorder to capture rejects.

during the monitoring tool I noticed that the extraction was done at a mere 9 rows per second.

The SQL costing was about 65000 timerons...
may be i need to get the DBA check on the indexes and spend some time fine tuning the query
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Another reason that rows/sec is a useless metric is the case where the extraction query involves sorting. The clock starts running but the query returns no rows until the sorting is complete. Then rows are probably delivered at quite a fast rate but, because the clock has been running since the query was issued, the reported rows/sec can be quite dismal.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
cosec
Premium Member
Premium Member
Posts: 230
Joined: Tue May 08, 2007 8:10 pm

Post by cosec »

Hello Ray,

The problem was due to the fact that there weren't any indexes created on the base tables besides the defaults.
We were suggested to use some indexes which may boost the performance by 80%.

Also on the Loading part as I mentioin earlier the array size is set to 1 and transaction size set to 1000. The reason I set the array size to 1 is so that rejects will be captured in a Seq file. Will I see an improvement if I change the array size to 100 and the transaction size to 10000. Also would this mean my reject will not be captured properly ??


ray.wurlod wrote:Another reason that rows/sec is a useless metric is the case where the extraction query involves sorting. The clock starts running but the query returns no rows until the sorting is complete. Then r ...
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

cosec wrote:Will I see an improvement if I change the array size to 100 and the transaction size to 10000.
Probably. Change it and let us know.
cosec also wrote:Also would this mean my reject will not be captured properly ??
If you were using Oracle, the answer would be an emphatic YES with any array size greater than one. Your reject count would be correct but the actual record written to the reject file wouldn't be correct. Can't speak to if the DB2 stage has the same issue but I'd wager it does. Test it yourself and let us know.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply