Run out of resources
Posted: Sat Oct 13, 2007 1:09 am
Hi, I am in a project which has about 400 jobs. All the jobs are parallel jobs. We organize these jobs into 20 sequence jobs. In each sequence job , the parallel jobs are arranged according to their dependency. The 20 sequence jobs are scheduled to run parallel.
But then we find that when the 20 sequence jobs are running parallel, some parallel jobs may failed suddenly. The failed jobs are not always the same. Maybe this time A failed B succeeded, maybe next time A succeeded B failed.
According to our analysis, we think the reason maybe the system resources are exhausted, for too many parallel jobs are running at the same time.
Now we want to control the job numbers that running at the same time. But we do not want to change the sequence job, but want to realize the following function: If a job's dependency conditions are fulfilled, before it can run, first check how many jobs are running now, if it is beyond the max number then sleep else run this job.
Does anyone know how to realize this function? Or do anyone has other solution?
Thanks!
But then we find that when the 20 sequence jobs are running parallel, some parallel jobs may failed suddenly. The failed jobs are not always the same. Maybe this time A failed B succeeded, maybe next time A succeeded B failed.
According to our analysis, we think the reason maybe the system resources are exhausted, for too many parallel jobs are running at the same time.
Now we want to control the job numbers that running at the same time. But we do not want to change the sequence job, but want to realize the following function: If a job's dependency conditions are fulfilled, before it can run, first check how many jobs are running now, if it is beyond the max number then sleep else run this job.
Does anyone know how to realize this function? Or do anyone has other solution?
Thanks!