Page 1 of 1

Acceptable number of stages in a job

Posted: Thu Dec 04, 2014 9:49 am
by le thuong
We have a discussion on the optimal number of stages in a job. With a complex business / functional requirement, we quickly reach a job with over 50 stages, and sometimes , we end up with a failure (fork() failed, Not enough space). As a work around, we have to split up the job into 2 jobs (or more). The price to pay is having to write intermediate data set at the end of Job 1, then read the intermediate data set in Job 2, instead of processing the single job. We run with 4 nodes.

Posted: Thu Dec 04, 2014 12:15 pm
by priyadarshikunal
There is no such optimal number/magic number in my opinion. It depends on the server and resources available. You have to balance between the resource usage and job modularity. With too many jobs, it will not be as manageable as it would be in case you have less jobs. If you are landing data in datasets then apart from disk I/O, others overhead can be avoided. A job with more number of stages will create more processes and will need more memory even if the stage is waiting for data to arrive. And complex jobs will be a little difficult to understand and bug identification.

I generally go with 10-15 stages with some exceptions. but thats my opinion and you are free to question it.

Posted: Thu Dec 04, 2014 1:24 pm
by qt_ky
A job you develop may have 100+ stages and you may know it very well, like the back of your hand, that is until you move on to other work. I hope you used a lot of helpful annotations for the next person who comes along to support it after you move on, or to help yourself months or years down the road when it needs enhancements. If you are the person entering a new shop, how complex are the job designs you want to be introduced to?

Posted: Thu Dec 04, 2014 1:47 pm
by FranklinE
A piece of wisdom I learned as a mainframe-batch developer: the balance point is not the number of processes inside a given job (be it COBOL or DataStage) but the accessibility to and recoverability of the points of failure.

As a rule of thumb, it has worked very well for me. My first design decision after knowing the requirements and their scope is identifying the critical points of failure. A self-contained job with many internal steps cannot have a point of failure, especially when rerunning the job means redoing processes that are expensive in and of themselves.

I'm being vague here because every shop has its unique qualities. We are heavily dependent on files for many reasons like backups and security of the data. It may seem odd or even unnecessary, but we routinely extract data from a database to a file as a first step rather than read directly from tables into the process. The database is a critical point of failure. Recoverability is much better when focused on a file than when there are multiple attempts to access the tables.

The environment is a spectrum of choices, not one thing or another thing, all or nothing. As always, your mileage may vary. :lol:

Posted: Fri Dec 05, 2014 1:34 am
by priyadarshikunal
I second the recoverability part of Franklin in addition to what I have written, and yes, shops gets charged per mainframe minutes and there recoverability is one of the main criteria while designing, as when required data is in file, its faster.

Posted: Fri Dec 05, 2014 8:44 am
by chulett
And static. :wink: