regarding Performance

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
sudhakar_viswa
Participant
Posts: 85
Joined: Fri Nov 18, 2005 5:35 am

regarding Performance

Post by sudhakar_viswa »

Hi,

To check the performance how many records are needed.usually i am taking 10 to 20 records.

Thanks,
sudhakar
i need to know datastage
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

That is a very small sample and won't return meaningful results; any slight changes in the system (a couple of other process doing a bit of work during your sampling period) will give you wildly different results.

For DataStage jobs I won't use any sample less than about 5 minutes run and preferably longer. Plus I'll run that several times over time to see if I get a large standard deviation on the speeds achieved.
sudhakar_viswa
Participant
Posts: 85
Joined: Fri Nov 18, 2005 5:35 am

Post by sudhakar_viswa »

Hi ARND,

I want the number i.e no.of records are needed to check the performance

bye,
sudhakar
i need to know datastage
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

The answer is enough rows to make you job run at least several minutes. I don't know your job or configuration; some installations are happy to get 500 row/second while others get 40,000/second.

The sample should be large enough to even out other system factors. Your standard deviation for repeating runs should be small; with 10-20 records your deviation will be huge and the resulting statistics won't mean anything, even on a lightly loaded Windows server. There are cache and buffers built into every aspect of a system (disk drive, disk controller, disk buffer memory, CPU cache, etc.) so by using a small sample you might get some great speeds because everything is accomplished in cache. That reminds me of a performance monitoring test that I wrote for a large health insurance company going to an EMC disk array. I had it fire off 3000 users simultaneously that did hundreds of thousands of simulated user queries, processed them and wrote data back. The test was supposed to stress-test the disk I/O subsystem for at least 12 hours; but it ran in under 5 seconds because the EMC had stored the whole database in it's cache...
sudhakar_viswa
Participant
Posts: 85
Joined: Fri Nov 18, 2005 5:35 am

Post by sudhakar_viswa »

Hi arnd,

Thanks for your reply.I am asking in general not for my scenario.

Thanks,
sudhakar
i need to know datastage
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Sudhakar,

I am trying to illustrate that there no set minimum number of rows to give reliable performance statistics. You need to achieve a minimum Job runtime (the longer the better to make the job startup times play a smaller role) and a low std. deviation between test runs. How many rows it takes to do this is irrelevant.
Post Reply