Limiting Job Parallelism

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

goffinw
Participant
Posts: 27
Joined: Thu Nov 18, 2004 6:50 am
Location: Belgium

Limiting Job Parallelism

Post by goffinw »

I wrote a Sequence Job, in which many Server Jobs run in parallel.
This design corresponds to the functional requirements of the project but when all there server jobs start simultaneously, the accessed database is overloaded.
I would like to keep the parallel design in the Server Job, but then I'd need a means to set a limit to the number of Server Jobs, which may be active at any time.
Does there exist a technique to do this?

Thanks in advance,
Wim
wnogalski
Charter Member
Charter Member
Posts: 54
Joined: Thu Jan 06, 2005 10:49 am
Location: Warsaw

Post by wnogalski »

IMHO You should redesign the sequence to run such a number of parallel jobs that won't overload the DB and the ETL server.
Regards,
Wojciech Nogalski
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Re: Limiting Job Parallelism

Post by chulett »

goffinw wrote:Does there exist a technique to do this?
Sure, but it involves alot of hand code. You need to be able to start X jobs, constantly loop through and monitor them and - when one finishes - throw another job into the pot. Keep doing that until all jobs are finished.

For the Sequencer route, something a little less sophisticated would be to decide on your max number of jobs. Have it run that many and then link them all to a Sequence stage set to 'All'. From there, link to another X jobs. Lather, rinse, repeat. If you've decided the most number of jobs you could run at one time was 10, let's say, then it would run 10 jobs and when all 10 where done run 10 more jobs, etc.

Not quite the same thing as always keeping 10 jobs running simultaneously, but it's all you are going to be able to accomplish in a Sequencer.
-craig

"You can never have too many knives" -- Logan Nine Fingers
chucksmith
Premium Member
Premium Member
Posts: 385
Joined: Wed Jun 16, 2004 12:43 pm
Location: Virginia, USA
Contact:

Post by chucksmith »

Create a sequencer with a single RoutineActivity stage and one job parameter, SleepSeconds. The RoutineActivity stage should call the Basic sleep statement.

Now, insert JobActivity stages before each of the current JobActivity stages in your current sequencer to execute your new sleep sequencer (created above) job. Set the SleepSeconds parameter to a different value in each JobActivity stage.

This should stagger the start of each job. Let me know how this works out.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Just create N parallel streams of jobs in the job sequence. Based on previous experience (from Director), try to make all streams contain jobs that give approximately the same total elapsed time.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Hmmm... pretty sure that's what I said. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
goffinw
Participant
Posts: 27
Joined: Thu Nov 18, 2004 6:50 am
Location: Belgium

Post by goffinw »

Thanks everybody,

I understand now. I'll control the 'N'-parallelism explicitely, without complex logic and without sleeps.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

chulett wrote:Hmmm... pretty sure that's what I said. :wink:
Nope. Your design, because of the All sequencers, waits till all ten have finished before starting the next ten.

A design with ten parallel streams of Job Activities - no sequencers needed - will sustain execution of ten jobs/job sequences while ever this is possible.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Oh... gotcha! :oops: I didn't quite follow what you meant by 'ten parallel streams' for some reason. Cool. 8)
-craig

"You can never have too many knives" -- Logan Nine Fingers
mangrick
Participant
Posts: 10
Joined: Fri May 28, 2004 6:09 am
Location: München

Post by mangrick »

Yes, but with the N parallel streams of jobs you can't controll dependencies between jobs of different streams. Right? e.g. populating all staging tables before any target table ...

I wonder if there is really no possibility for that kind of controll in DataStage. What about an external scheduler? e.g. Tivoli
Has someone a hint?

I think there are other ETL tools out that at least can limit the number of parallel jobs.

M.A.
Regards,
Mathias Angrick
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

mangrick wrote:I wonder if there is really no possibility for that kind of controll in DataStage. What about an external scheduler? e.g. Tivoli
Has someone a hint?
Sure, and it's been mentioned. Hand code. The ones that I know of - Peter has written one, Kenneth has, even I have on a smaller scale.

I suppose you could do something sorta like it via an external scheduler but it would be painful as heck to setup... or change. We use Control-M and it certainly can handle dependencies but we only use it at the 'macro' or high level to start our load balancing process - not to do the actual load balancing.... which I doubt it could be setup to handle.
-craig

"You can never have too many knives" -- Logan Nine Fingers
clshore
Charter Member
Charter Member
Posts: 115
Joined: Tue Oct 21, 2003 11:45 am

Post by clshore »

If you are talking about commercial job schedulers, then most have some notion of resource allocation.
You can, for example, define a resource, and it's max load (100 for example), and then associate a resource load value for each job (10 for example). The scheduler will wait until sufficient resource is available before launching the job.
The down side is that you have to break your jobs into individual modules that are callable from the external OS, and make sure that your inter-job dependencies are externalized and available to the scheduler.
But many large enterprises will require that you do this anyway, in order to make it possible for operations folks to run and administer the jobs without being DataStage aware (and without having to call you in the middle of the night, just to re-start a job).

Carter
goffinw
Participant
Posts: 27
Joined: Thu Nov 18, 2004 6:50 am
Location: Belgium

Post by goffinw »

In the latest posts, the question is raised on what is or should be the relation between DataStage and an external scheduler.
From my exerience in ETL it is a BIG plus if a ETL tool has its own sheduler. Because it are the same people who do the set up of the system, who receive functional AND operatiol requirements for the ETL, who are responsible for the design and development of the ETL jobs and who finally deliver the ETL stuff that needs to be run in production.
If these people are not able to develop the scheduling of this ETL them selves, or if this sheduling development is not easy, then you have a big extra overhead on the project. And enterprise class scheduling is always required in the environment where DataStage is deployed.
Therefore, either one of these is required:
  • The scheduling functions of DataStage should be enriched
  • A tight integration of the DataStage Jobs with existing scheduling tools should be available
Does anyone know the best available options?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

DataStage is "integrated" - as much as it can be preserving functionality across UNIX and Windows platforms - with the operating system at scheduler.

There is an easy-to-use interface to at in the Director client. With this you can do pretty much anything that at can do on both UNIX and Windows operating systems.

If you want to use a third party scheduler that's your choice, and there is a command line interface called dsjob to DataStage that allows jobs to be executed and their status interrogated.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
goffinw
Participant
Posts: 27
Joined: Thu Nov 18, 2004 6:50 am
Location: Belgium

Post by goffinw »

Ray,

indeed. I like it that DataStage is integrated as much as possible. But it doesn't have a scheduler/monitor that answers to some requirements, as the one that was discussed higher in this topic. Would it be a priority for Ascential to ad more functionality (like this one) into the scheduler?
Post Reply