Maximum stages in a job

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

rkdatastage
Participant
Posts: 107
Joined: Wed Sep 29, 2004 10:15 am

Maximum stages in a job

Post by rkdatastage »

Hi

can any one can clear my doubt that is there any limitation in using the stages in a job. As i want to design a job which is planned to use more number of stages available in a job.
Is it a Correct Process to design the most complex job as single job or have to divide it into multiple jobs.
Is there any limitation for using the stages in a job in datastage or
Maximum stages that i can be used in a job....?

earliest response will be appriciated.
thanks in advance

RK
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

There is always going to be some limit on the number of stages a job can support - but this is far, far higher than any acceptable job you will write.

Put yourself in the shoes of someone opening up your job and trying to understand it in order to make some changes. If you have 150 stages and lines going all over the place it is going to be quite difficult to understand. If the job has only 20 stages it is much easier to understand, plus it fits on one or two pages on the display canvas in the designer.

Many developers have a personal maximum number of stages; I prefer to have a limit on job complexity. If I can't grasp an overview from the designer canvas the job is too complex. Some jobs cannot be easily split across several jobs without paying a performance price, so those can (and should) remain as they are, but most jobs can be split - especially if the output of one is a named pipe that is used as the input to another.
loveojha2
Participant
Posts: 362
Joined: Thu May 26, 2005 12:59 am

Post by loveojha2 »

More than that you can use Local Containers, which will make it more understandable(visually).
Success consists of getting up just one more time than you fall.
RayNother
Participant
Posts: 11
Joined: Tue Sep 27, 2005 5:05 am
Location: UK

Post by RayNother »

I always keep the jobs as small as possible and then put them all into a bigger sequence(s).
I find it easier to fault find when supporting jobs in testing/production this way...

IMHO you should always keep things simple.

Ray
WoMaWil
Participant
Posts: 482
Joined: Thu Mar 13, 2003 7:17 am
Location: Amsterdam

Post by WoMaWil »

perhaps we can out-praise a bottle of Champagner for the one who privides a job running in his production with maximum of stages.

Wolfgang
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Wolfgang,

I know I've seen jobs with over 100 stages. Don't know if they ever ran, though...
WoMaWil
Participant
Posts: 482
Joined: Thu Mar 13, 2003 7:17 am
Location: Amsterdam

Post by WoMaWil »

Arnd,

I had a Job at e-plus in Dusseldorf with 115 Stages to populate a dimension branch, which worked very fine in production and took 5 minutes to finish.

Who does top that number?

Wolfgang

PS: For sure, that was in last century, now I am a bit more expirienced and my aim now is to have a minimum of stages for each task.
ravij
Premium Member
Premium Member
Posts: 170
Joined: Mon Oct 10, 2005 7:04 am
Location: India

Re: Maximum stages in a job

Post by ravij »

Hi

There is no limitation of using the no of stages in a single job, but if u use more stages in a single job it will become complex n confusing . If u split the whole job into small jobs it will be easy for u to handle the exceptions n errors.

bye
JRK
rkdatastage wrote:Hi

can any one can clear my doubt that is there any limitation in using the stages in a job. As i want to design a job which is planned to use more number of stages available in a job.
Is it a Correct Process to design the most complex job as single job or have to divide it into multiple jobs.
Is there any limitation for using the stages in a job in datastage or
Maximum stages that i can be used in a job....?

earliest response will be appriciated.
thanks in advance

RK
koolnitz
Participant
Posts: 138
Joined: Wed Sep 07, 2005 5:39 am

Post by koolnitz »

Guys,

Recently, i discussed this topic with one of the DS consultants. He also advised the same thing which all of you are commending.

In nutshell, whenever it's possible to break a complex job, go for it. At the same time I fully agree with Arnd that if "divide and rule" is hampering the performance then worth to have all the fruits in one tree.

Well, I personally prefer to have atmost 20-22 stages in a job.

Cheers!!
Nitin Jain | India

If everything seems to be going well, you have obviously overlooked something.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Forget about using DataStage for a moment.

In the world of writing computer programs, is it "best" to have a single 5000 line top-down program, or a collection of small modular routines, methods, and procedures that may reach 8000 lines of code?

Best - define it. Best design to maintain? Best design for performance? Best design for the next guy? Best design for time-to-develop? Best design for trouble-shooting?

My opinion, it's not a competition to who can architect an ETL application in the fewest jobs. It's a competition for whose architecture last for years without having every job constantly re-written on every enhancement or change.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

There is an excellent book out there called The Elements of Programming Style - although it concentrates on language-based coding there is a lot of good advice, most of which can be generalized to graphical programming. Modularity is one of the main principles espoused, primarily for ease of understanding, re-use and maintenance.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
clshore
Charter Member
Charter Member
Posts: 115
Joined: Tue Oct 21, 2003 11:45 am

Post by clshore »

I was on an EE assignment where the number of stages exceeded 1,000 in several jobs.
My firm was called in midway through the project, after the jobs were written, to help resolve 'some issues'.
After some modifications, and much tweaking of kernel, memory, and disk resources, the jobs did actually run and satisfy requirements.
Working with the jobs in Designer was challenging. They took a long time to load. Viewing the whole job, most of the stage icons were so small that they could not be discerned on the palette as anything but blobs. When panning or zooming, the refresh took so long that it was painful.
It's not the way I would do it, but it's what the client created, and wanted, and it meets their needs.

Carter
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

O. M. G. :shock:
-craig

"You can never have too many knives" -- Logan Nine Fingers
aartlett
Charter Member
Charter Member
Posts: 152
Joined: Fri Apr 23, 2004 6:44 pm
Location: Australia

Post by aartlett »

On my current assignment I'm here to oversee on behalf of the client the vendor that is/was doing the actual design/build work on a 7.1 SE system. I came up with same nameing standards and some sort of design standards before they started work.

I then told them that IMHO maintainability is more important than performance, unless performance causes a major blockage and them we'll take that in a case by case basis.

The result: fairly efficient jobs that are easy to read (left to right, inputs on top and right, outputs left and down), stage names that have a meaning, and thanks to the DS DOCO maker we have automatic documentation.

90% of the jobs passed QA first time, 100% the send time and most of the problems were in a hash not being cached (part of the standards).

My answer to the OP: Maintainability first, if you can't read it, the next guy in 6 months can't fix it. This foes for all languages, not just DS.
Andrew

Think outside the Datastage you work in.

There is no True Way, but there are true ways.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Some things to consider when opting for the "all-in-one" jobs for both PX and Server:

1. Large design size means importing/exporting the job takes longer - produces a larger dsx file for a single job.
2. Large designs means that more logic is within a single job - only one developer has write access to a job (even Hawk limits one user making changes even though others have read-only access).
3. All-in-one designs are non-recoverable from waypoints in the logic. There are no points for resumption of processing in the event of failure.
4. All-in-one designs usually are often nearly incomprehensible given the graphical metaphor is supposed to mean "at-a-glance" someone knows what the job is doing.
5. All-in-one designs usually mean to troubleshoot, the job has to be "exploded" into smaller constituent jobs just to figure out where the data is going "bad" during processing.
6. Sometimes, the job has to be "exploded" just so that a surgical enhancement can be made, and then reconstituted into the all-in-one form.
7. The all-in-one design sometimes limits another job from running because a lookup (either hash or dataset) being built needs to be reused by the other job so that a dependency is imposed. The alternative is that the same logic exists in two all-in-one jobs doubling resource consumption for that portion, or even worse two non-related jobs are coupled because of that one common lookup.

The alternative architecture has its issues as well:

1. Small and modular jobs means that processing activities are separate jobs, requiring more sophisticated usage of Sequencers or custom job control to manage executing jobs in dependent and hopefully concurrent fashion.
2. More jobs means that a method to communicate the data between jobs has to be established: files or pipes.
3. Smaller jobs means that the design library has a significant increase in objects and the naming conventions and foldering become more important.
4. More jobs means that careful documentation is required to piece together the now broken apart flow.
5. Data lineage becomes more difficult, as tracing the target column resultant value back to its origination point requires traversing stages and jobs, not just stages.
6. More jobs means managing versions are more complicated, as the correct version of every job in a transformation jobstream (batch?) has to be correct.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Post Reply