Performance question

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
amsh76
Charter Member
Charter Member
Posts: 118
Joined: Wed Mar 10, 2004 10:58 pm

Performance question

Post by amsh76 »

Is big parallel job, with number of stages efficient compare to a job divided in 2 or 3 different jobs ?

I always feel simpler the better and easy to debug and maintain. Is it true for parallel jobs also? or its the otherway..
pnchowdary
Participant
Posts: 232
Joined: Sat May 07, 2005 2:49 pm
Location: USA

Post by pnchowdary »

Hi amsh,

I feel that the big parallel job is better than splitting it into 2 or 3 smaller jobs. If you split the big job into smaller jobs and then run them in sequence, possibly using a job sequencer to achieve the same result. The overhead in initializing,starting,processing,writing the log and not to mention, the overhead in passing the status of the previous job to the next job is more. I am pretty sure there are exceptions to it, but this is my take on it.

Thanks,
Naveen
amsh76
Charter Member
Charter Member
Posts: 118
Joined: Wed Mar 10, 2004 10:58 pm

Post by amsh76 »

Thanks Naveen, I got your point...but then won't it make it hard to debug or even maintain ??
pnchowdary
Participant
Posts: 232
Joined: Sat May 07, 2005 2:49 pm
Location: USA

Post by pnchowdary »

Hi amsh,

Its true that it will be hard to debug and maintain. But that's the price to improve your performance. So, you need to decide the design of your job, based on which factors are more important to you.


Thanks,
Naveen
rasi
Participant
Posts: 464
Joined: Fri Oct 25, 2002 1:33 am
Location: Australia, Sydney

Post by rasi »

In Big Parallel job you are not parking the temp file to the database/dataset/sequential etc. This makes a big difference because of less I/O. Whenever you want to debug the big you can break that into small pieces and fix it and put it back into your BIG Job
Regards
Siva

Listening to the Learned

"The most precious wealth is the wealth acquired by the ear Indeed, of all wealth that wealth is the crown." - Thirukural By Thiruvalluvar
elavenil
Premium Member
Premium Member
Posts: 467
Joined: Thu Jan 31, 2002 10:20 pm
Location: Singapore

Post by elavenil »

Big parallel job gives good performance and at the same time, you need to consider the available resources while running the parallel jobs concurrently. Parallel jobs grab the resources whatever is available in the server.

Regards
Saravanan
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

I have found parallel jobs to have more stages in them then server jobs. Better performance is achieved by not writing to disk so our jobs try to carry the data further. There is also a tendency to move functions away from external products such as the RDBMS engine and Unix scripts into parallel stages. Where I might have done a join in a DB stage in a server job I find the parallel sort and joins can be more efficient. Same goes for a parallel sort versus a Unix sort.

There is still a requirement for a robust approach to rollback and recovery. We still land our data to a staging area prior to processing and land them again to load ready files prior to database loads.
Post Reply