Performance

vijay.barani · Post by **vijay.barani** » Thu Mar 12, 2009 3:56 am

Dear Friends,
Could anyone of you lemme know how to increase the throughput(No. of rows/Sec) in a job.What are all could be the factors which Effect/Affect the performance??
In my project some jobs have very high throughput whereas few of them are very slow viz., 30 rows/sec.

Thanks in Advance.

bkumar103 · Post by **bkumar103** » Thu Mar 12, 2009 4:47 am

It depends on the job architecture and the CPU load at the particular time when job is run.

sbass1 · Post by **sbass1** » Thu Mar 12, 2009 5:29 am

There are a lot of factors involved, but I've noticed that a lot of writes to the job log can really slow it down.

vijay.barani · Post by **vijay.barani** » Thu Mar 12, 2009 5:31 am

The job architecture for some jobs is not very complicated but run in more than one hour.Whereas for the same no. of records,some jobs inspite of their complicancy runin 1-2 mins..
Moreover CPU is especially dedicated to the project and at a time only one job runs !!

bkumar103 wrote:It depends on the job architecture and the CPU load at the particular time when job is run.

vijay.barani · Post by **vijay.barani** » Thu Mar 12, 2009 6:01 am

I have covered many factors..
Everytime I run the job,i clear the log file.
Also I've checked the memory size and CPU distribution.There is one CPU with 16GB RAM.
I've checked transaction size and array size too.Ran with different values
No Sort stage is being used.All the data manipulations are done before Transformation Stage.
Cache Size is good enough (512MB)

May I know the factors other than these..

sbass1 wrote:There are a lot of factors involved, but I've noticed that a lot of writes to the job log can really slow it down.

ray.wurlod · Post by **ray.wurlod** » Thu Mar 12, 2009 3:17 pm

Are you using inter-process row buffering to leverage multiple processes (assuming that you have more than one active stage)? Have you considered using multiple instances of the job each processing a distinct subset of the data, but running in parallel (assuming that you have spare resources)? Are you preventing warnings from being logged (through good design)? What is your target and how are you writing to it? (For example upsert is slowest, separate insert and update are better, bulk load is fastest for insert but may be even faster if you use a Sequential File stage to generate the data file and use a fixed control file.)