Performance question

amsh76 · Post by **amsh76** » Fri Jul 08, 2005 12:27 pm

Is big parallel job, with number of stages efficient compare to a job divided in 2 or 3 different jobs ?

I always feel simpler the better and easy to debug and maintain. Is it true for parallel jobs also? or its the otherway..

pnchowdary · Post by **pnchowdary** » Fri Jul 08, 2005 12:52 pm

Hi amsh,

I feel that the big parallel job is better than splitting it into 2 or 3 smaller jobs. If you split the big job into smaller jobs and then run them in sequence, possibly using a job sequencer to achieve the same result. The overhead in initializing,starting,processing,writing the log and not to mention, the overhead in passing the status of the previous job to the next job is more. I am pretty sure there are exceptions to it, but this is my take on it.

Thanks,
Naveen

amsh76 · Post by **amsh76** » Fri Jul 08, 2005 2:29 pm

Thanks Naveen, I got your point...but then won't it make it hard to debug or even maintain ??

pnchowdary · Post by **pnchowdary** » Fri Jul 08, 2005 3:07 pm

Hi amsh,

Its true that it will be hard to debug and maintain. But that's the price to improve your performance. So, you need to decide the design of your job, based on which factors are more important to you.

Thanks,
Naveen

rasi · Post by **rasi** » Fri Jul 08, 2005 10:57 pm

In Big Parallel job you are not parking the temp file to the database/dataset/sequential etc. This makes a big difference because of less I/O. Whenever you want to debug the big you can break that into small pieces and fix it and put it back into your BIG Job

elavenil · Post by **elavenil** » Sat Jul 09, 2005 2:52 am

Big parallel job gives good performance and at the same time, you need to consider the available resources while running the parallel jobs concurrently. Parallel jobs grab the resources whatever is available in the server.

Regards
Saravanan

vmcburney · Post by **vmcburney** » Sun Jul 10, 2005 7:22 pm

I have found parallel jobs to have more stages in them then server jobs. Better performance is achieved by not writing to disk so our jobs try to carry the data further. There is also a tendency to move functions away from external products such as the RDBMS engine and Unix scripts into parallel stages. Where I might have done a join in a DB stage in a server job I find the parallel sort and joins can be more efficient. Same goes for a parallel sort versus a Unix sort.

There is still a requirement for a robust approach to rollback and recovery. We still land our data to a staging area prior to processing and land them again to load ready files prior to database loads.