Performance problems-running muliple jobs in parallel

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
verify
Premium Member
Premium Member
Posts: 99
Joined: Sun Mar 30, 2008 8:35 am

Performance problems-running muliple jobs in parallel

Post by verify »

we are using UNIX programs to invoke datastage jobs. There are 300 jobs that load different types of sequential flat files into oracle database. All these flat files are picked sequentially by datastage i.e file2 is job starts only after file1 is processed. We have noticed that the server CPU utilization is not crossing more than 5% during the execution of jobs. 95% of the CPU is sitting idle.
Is there a way i can call 10 instances of my UNIX program that invokes datstage jobs to improve performance? or please suggest us any better way of achieving this.
RK Raju
dnsjain
Charter Member
Charter Member
Posts: 34
Joined: Thu May 08, 2003 2:12 pm

Post by dnsjain »

You can improve the performance by calling multiple jobs at the same time so instead of running one job at a time you run multiple ones. You can achieve this some of these ways:

1. Define sequence job and in the sequence job start multiple job at once.
2. If you are running same job multiple times make the job multiple instance enabled and call the same job multiple times with different instance ID in the sequence job.
3. Schedule multiple jobs at once.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Get yourself some real job control, something that will allow you to define dependancies and will keep as many jobs running simultaneously as possible. Something like... oh, I don't know... our Ken Bland's Job Control Utilities that he gives away free from his website.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Take an Economics 101 class - particular where they talk about supply and demand.

You must monitor your jobs under normal (baseline) conditions and perhaps running in isolation under full data load then, from these measures, calculate the maximum demand you can place on the system, and schedule accordingly.

The Resource Estimator tool you get in version 8.0 is particularly useful in this task. It will estimate resource consumption either from the job design or by running a sample of rows. Make sure it is a statistically meaningful sample size if you go this route.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply