Hi,
To check the performance how many records are needed.usually i am taking 10 to 20 records.
Thanks,
sudhakar
regarding Performance
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 85
- Joined: Fri Nov 18, 2005 5:35 am
regarding Performance
i need to know datastage
That is a very small sample and won't return meaningful results; any slight changes in the system (a couple of other process doing a bit of work during your sampling period) will give you wildly different results.
For DataStage jobs I won't use any sample less than about 5 minutes run and preferably longer. Plus I'll run that several times over time to see if I get a large standard deviation on the speeds achieved.
For DataStage jobs I won't use any sample less than about 5 minutes run and preferably longer. Plus I'll run that several times over time to see if I get a large standard deviation on the speeds achieved.
-
- Participant
- Posts: 85
- Joined: Fri Nov 18, 2005 5:35 am
The answer is enough rows to make you job run at least several minutes. I don't know your job or configuration; some installations are happy to get 500 row/second while others get 40,000/second.
The sample should be large enough to even out other system factors. Your standard deviation for repeating runs should be small; with 10-20 records your deviation will be huge and the resulting statistics won't mean anything, even on a lightly loaded Windows server. There are cache and buffers built into every aspect of a system (disk drive, disk controller, disk buffer memory, CPU cache, etc.) so by using a small sample you might get some great speeds because everything is accomplished in cache. That reminds me of a performance monitoring test that I wrote for a large health insurance company going to an EMC disk array. I had it fire off 3000 users simultaneously that did hundreds of thousands of simulated user queries, processed them and wrote data back. The test was supposed to stress-test the disk I/O subsystem for at least 12 hours; but it ran in under 5 seconds because the EMC had stored the whole database in it's cache...
The sample should be large enough to even out other system factors. Your standard deviation for repeating runs should be small; with 10-20 records your deviation will be huge and the resulting statistics won't mean anything, even on a lightly loaded Windows server. There are cache and buffers built into every aspect of a system (disk drive, disk controller, disk buffer memory, CPU cache, etc.) so by using a small sample you might get some great speeds because everything is accomplished in cache. That reminds me of a performance monitoring test that I wrote for a large health insurance company going to an EMC disk array. I had it fire off 3000 users simultaneously that did hundreds of thousands of simulated user queries, processed them and wrote data back. The test was supposed to stress-test the disk I/O subsystem for at least 12 hours; but it ran in under 5 seconds because the EMC had stored the whole database in it's cache...
-
- Participant
- Posts: 85
- Joined: Fri Nov 18, 2005 5:35 am
Sudhakar,
I am trying to illustrate that there no set minimum number of rows to give reliable performance statistics. You need to achieve a minimum Job runtime (the longer the better to make the job startup times play a smaller role) and a low std. deviation between test runs. How many rows it takes to do this is irrelevant.
I am trying to illustrate that there no set minimum number of rows to give reliable performance statistics. You need to achieve a minimum Job runtime (the longer the better to make the job startup times play a smaller role) and a low std. deviation between test runs. How many rows it takes to do this is irrelevant.