Acceptable number of stages in a job

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply
le thuong
Premium Member
Premium Member
Posts: 76
Joined: Wed Sep 09, 2009 5:21 am

Acceptable number of stages in a job

Post by le thuong »

We have a discussion on the optimal number of stages in a job. With a complex business / functional requirement, we quickly reach a job with over 50 stages, and sometimes , we end up with a failure (fork() failed, Not enough space). As a work around, we have to split up the job into 2 jobs (or more). The price to pay is having to write intermediate data set at the end of Job 1, then read the intermediate data set in Job 2, instead of processing the single job. We run with 4 nodes.
Thuong

best regards
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

There is no such optimal number/magic number in my opinion. It depends on the server and resources available. You have to balance between the resource usage and job modularity. With too many jobs, it will not be as manageable as it would be in case you have less jobs. If you are landing data in datasets then apart from disk I/O, others overhead can be avoided. A job with more number of stages will create more processes and will need more memory even if the stage is waiting for data to arrive. And complex jobs will be a little difficult to understand and bug identification.

I generally go with 10-15 stages with some exceptions. but thats my opinion and you are free to question it.
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

A job you develop may have 100+ stages and you may know it very well, like the back of your hand, that is until you move on to other work. I hope you used a lot of helpful annotations for the next person who comes along to support it after you move on, or to help yourself months or years down the road when it needs enhancements. If you are the person entering a new shop, how complex are the job designs you want to be introduced to?
Choose a job you love, and you will never have to work a day in your life. - Confucius
FranklinE
Premium Member
Premium Member
Posts: 739
Joined: Tue Nov 25, 2008 2:19 pm
Location: Malvern, PA

Post by FranklinE »

A piece of wisdom I learned as a mainframe-batch developer: the balance point is not the number of processes inside a given job (be it COBOL or DataStage) but the accessibility to and recoverability of the points of failure.

As a rule of thumb, it has worked very well for me. My first design decision after knowing the requirements and their scope is identifying the critical points of failure. A self-contained job with many internal steps cannot have a point of failure, especially when rerunning the job means redoing processes that are expensive in and of themselves.

I'm being vague here because every shop has its unique qualities. We are heavily dependent on files for many reasons like backups and security of the data. It may seem odd or even unnecessary, but we routinely extract data from a database to a file as a first step rather than read directly from tables into the process. The database is a critical point of failure. Recoverability is much better when focused on a file than when there are multiple attempts to access the tables.

The environment is a spectrum of choices, not one thing or another thing, all or nothing. As always, your mileage may vary. :lol:
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

I second the recoverability part of Franklin in addition to what I have written, and yes, shops gets charged per mainframe minutes and there recoverability is one of the main criteria while designing, as when required data is in file, its faster.
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

And static. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply