Run out of resources

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
donhoff
Participant
Posts: 8
Joined: Fri Sep 07, 2007 9:58 pm

Run out of resources

Post by donhoff »

Hi, I am in a project which has about 400 jobs. All the jobs are parallel jobs. We organize these jobs into 20 sequence jobs. In each sequence job , the parallel jobs are arranged according to their dependency. The 20 sequence jobs are scheduled to run parallel.

But then we find that when the 20 sequence jobs are running parallel, some parallel jobs may failed suddenly. The failed jobs are not always the same. Maybe this time A failed B succeeded, maybe next time A succeeded B failed.

According to our analysis, we think the reason maybe the system resources are exhausted, for too many parallel jobs are running at the same time.

Now we want to control the job numbers that running at the same time. But we do not want to change the sequence job, but want to realize the following function: If a job's dependency conditions are fulfilled, before it can run, first check how many jobs are running now, if it is beyond the max number then sleep else run this job.

Does anyone know how to realize this function? Or do anyone has other solution?

Thanks!
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

There are many ways to approach this "throttling" of resources; but none of them are builtin to the product

One simple (but not necessarily 100% effective approach) is to add a call to a user-written before job routine in every PX job. This could easily be added to all by doing an export of the project and global edit of the .dsx file.

The before-job subroutine (written in DS/BASIC, even for EE jobs) would shell out to UNIX and use the ps -ef command to count either orchestrate sessions or DataStage userids (your choice in how to count) and if the number exceeds some predetermined amount it would SLEEP and try again.

This process would need to be refined so that a process won't wait too long and you would also need to decide how to prioritize processes. You could add in calls to see how busy the system really is (i.e., if you have 200 PX processes running but only have 50% cpu load you could start more).

It would make more sense to put this kind of load limiting into your sequences, so that the job isn't started until sufficient resources are available; but the before-job method would work as well.
Post Reply