Page 1 of 1

regarding Performance

Posted: Tue Jan 31, 2006 2:46 am
by sudhakar_viswa
Hi,

To check the performance how many records are needed.usually i am taking 10 to 20 records.

Thanks,
sudhakar

Posted: Tue Jan 31, 2006 2:57 am
by ArndW
That is a very small sample and won't return meaningful results; any slight changes in the system (a couple of other process doing a bit of work during your sampling period) will give you wildly different results.

For DataStage jobs I won't use any sample less than about 5 minutes run and preferably longer. Plus I'll run that several times over time to see if I get a large standard deviation on the speeds achieved.

Posted: Tue Jan 31, 2006 3:03 am
by sudhakar_viswa
Hi ARND,

I want the number i.e no.of records are needed to check the performance

bye,
sudhakar

Posted: Tue Jan 31, 2006 3:17 am
by ArndW
The answer is enough rows to make you job run at least several minutes. I don't know your job or configuration; some installations are happy to get 500 row/second while others get 40,000/second.

The sample should be large enough to even out other system factors. Your standard deviation for repeating runs should be small; with 10-20 records your deviation will be huge and the resulting statistics won't mean anything, even on a lightly loaded Windows server. There are cache and buffers built into every aspect of a system (disk drive, disk controller, disk buffer memory, CPU cache, etc.) so by using a small sample you might get some great speeds because everything is accomplished in cache. That reminds me of a performance monitoring test that I wrote for a large health insurance company going to an EMC disk array. I had it fire off 3000 users simultaneously that did hundreds of thousands of simulated user queries, processed them and wrote data back. The test was supposed to stress-test the disk I/O subsystem for at least 12 hours; but it ran in under 5 seconds because the EMC had stored the whole database in it's cache...

Posted: Tue Jan 31, 2006 5:57 am
by sudhakar_viswa
Hi arnd,

Thanks for your reply.I am asking in general not for my scenario.

Thanks,
sudhakar

Posted: Tue Jan 31, 2006 6:07 am
by ArndW
Sudhakar,

I am trying to illustrate that there no set minimum number of rows to give reliable performance statistics. You need to achieve a minimum Job runtime (the longer the better to make the job startup times play a smaller role) and a low std. deviation between test runs. How many rows it takes to do this is irrelevant.