Opinions on Job Sequences versus Basic Code Batch-sequences

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
peterbaun
Premium Member
Premium Member
Posts: 93
Joined: Thu Jul 10, 2003 5:27 am
Location: Denmark
Contact:

Opinions on Job Sequences versus Basic Code Batch-sequences

Post by peterbaun »

Hi -

I am just curious and would like to hear some opinions about whether you use Job Sequences or you code batch-sequences in Basic for the job-sequence-environment (or both). In general I would say that it is possible to make much more sophisticated batch environments with Basic code than with the job-sequences (...or ?) - even though you can get pretty far with job-sequences when you have a well designed architecture.

Merry Christmas to everyone.

Regards
Peter
datastage
Participant
Posts: 229
Joined: Wed Oct 23, 2002 10:10 am
Location: Omaha

Re: Opinions on Job Sequences versus Basic Code Batch-sequen

Post by datastage »

peterbaun wrote: I am just curious and would like to hear some opinions about whether you use Job Sequences or you code batch-sequences in Basic for the job-sequence-environment (or both). In general I would say that it is possible to make much more sophisticated batch environments with Basic code than with the job-sequences (...or ?) - even though you can get pretty far with job-sequences when you have a well designed architecture.
Sequencers are very easy to understand to beginner and intermediate developers. Its easy to understand the batch logic when viewing someones elses code. And the GUI with code reuse components speeds up development.

I think a good approach is to always start with Job Sequencers as a default. When you have the need to do something more advanced or creative then shift to the custom basic coding. You can even call your custom basic development with sequencers, leaving everything with good standards and easy to understand, but of course you should never be afraid to delve deep into a Basic program to get the job done.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

You might be interested in reading this:

viewtopic.php?t=85210

I personally think that Sequences are simplistic in design but not scalable solutions. You ultimately end up nesting a lot of sequences inside sequences, and restart capability becomes muddied at that point. In addition, modifying a Sequence to add new jobs with dependencies can literally destroy the Sequence job to the point of you're better off just writing a new job rather than modify the existing. You also are limited to the functionality in the Sequencer, there is no enhancing or tweaking.

Parameter management is a big issue for every client of mine. You just have to see how parameters are handled, it can be difficult. Add jobs to existing Sequencers is a trial in patience, as well as refreshing the parameter list once you've added parameters to your jobs. A globally added parameter to all your jobs requires manually updating Sequencers to pick up that parameter and propagate it. Not fun.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
peterbaun
Premium Member
Premium Member
Posts: 93
Joined: Thu Jul 10, 2003 5:27 am
Location: Denmark
Contact:

Post by peterbaun »

Hello again -

Thanks for the input.

I must admit that I hadn't really noticed that many differences in job-sequences between 5.x and 6.x (or 7.x for that matter). Currently I only have access to and work with 5.x (we will upgrade during fall - and I am really looking forward to do that !)

Kenneth - the description of you batch-execution environtment in the thread you refer to really puts into perspective what you cannot do with job-sequences.

Furher more I also see a weakness in that if you have sequence that executes new sequences you actually don't catch eventual errors from the sub-sequences in the main-sequence (possibly this has been changed in 7.x ?)

Regards
Peter
Teej
Participant
Posts: 677
Joined: Fri Aug 08, 2003 9:26 am
Location: USA

Post by Teej »

peterbaun wrote:Furher more I also see a weakness in that if you have sequence that executes new sequences you actually don't catch eventual errors from the sub-sequences in the main-sequence (possibly this has been changed in 7.x ?)
I am not sure if I read this correctly, but if you meant to ask whether sequencers capture failures by sub-sequencers -- that is possible with at least 6.x.

Whenever a job fails, the sequencer that calls the job should call a routine, "AbortToLog()". Thus, any sequencers that calls that sequencer will see that the sequencer aborted.

If you are referring to sub-sequencers having a programmatical error, that's something I am not too familiar with at the moment.

-T.J.
Developer of DataStage Parallel Engine (Orchestrate).
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

A master sequence that calls a job activity stage that is another sequence has access only to the information that the job activity stage can return. If you wanted to do further analysis, such as how long did that job run, what messages may have come up in its log, what was its link statistical performance (was it deviant in the RI tolerance limits, say 10% of a lookup was missing), then you have no inherent facility in a Sequencer job design. You would end up building a routine library that you could wrap around a job activity stage call to do this. You're then about 50% of the way towards what I've done.

Next, in the event of multi-layered sequences, you have no intercommunication between sub-layer sequences. For example, if you have a Master Sequence that is running three sub sequences in parallel, and those sub sequences each of 10 job stages with various degrees of serial and parallel job activities in their streams. In the event that one job fails, you may wish the ENTIRE stream, sequence independent, to stop at the next point of each sequence starting a new task.

Furthermore, resurrection capability of said jobstream, from last point of failure, across multi-layered sequences is, to be kind, difficult. If you're into process metadata at runtime being available to the job stream for decision making, this is not an option without again building said function library.

Lastly, a single icon in a sequence cannot represent a divide-n-conquer job instantiation, without putting down a job activity stage for each and every instance, complete with dependency links and full parameter entry. This can be tiresome if you do this quite often.

Everything I have mentioned here is bread-n-butter batch data warehouse ETL architecture. I try very hard to never use esoteric, classroom, or academic examples, but refer to actual real-life implementations and their issues. I am a follower of Kimball, so that these considerations mentioned here are part of the larger ETL infrastructure discussed by Kimball in his Lifecycle Toolkit book.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

I think job sequences need some help. The visual aspects are important enough to fix. I think what you need is a combination of Parameter Manager and couple other things to fix them. I think you need a routine to start all jobs. This routine should know if it is a rerun or a new run. A rerun should check to see if this job has successfully finished after the last new run. If not then run it. If it is not in a runnable state then reset it first. After this job finishes or aborts then this status is logged and returned to the sequence. If it aborts then all email notification should be done on this level and not in the sequence. The sequences get too cluttered if they have a success link and a failure link. Basically add all of Ken's logic in his very cool and complex batch program in a sequence then you have the best of both worlds.

I think you need containers in sequences. These should not function like shared containers but should hide groups of jobs. This would make hundreds of jobs viewable. For example a container for:

1. Source to sequential jobs or my extraction jobs.
2. Building surrogate key starting values.
3. Hash lookups.
4. Loading dimension tables.
5. Loading fact tables.
6. Aggreagate jobs.

A second methodology might be based on dependecies like building tables without foreign keys first. Jobs which load tables with foreign keys are dependant on the and these jobs. The key needs to be loaded before it can be referenced. This may apply to an ODS more than a DW or datamart but if you have foreign keys then they need to be loaded first. The only way this is not true is if you drop all constraints before loading tables.

I always wanted this feature in ERwin. It maybe in the new version. I have not looked at it in a while. The concept is GUI drill down feature where you have containers, folders, categories or whatever you want to call them in the GUI and as you double click on the object more and more detail is exposed. This is how we think about objects. We group them into categories and we get more and more detailed as needed to solve the problem. ERwin has the stupid subject areas or views. These are not built into the product. It is not natural or intuitive to browse these drawings with hundreds of tables. These products are supposed to help us break down the complexity of development into managable concepts but instead we print these giant schemas and use them as wallpaper. Even when we only print a subject area is still can be very ugly.

Metadata management is supposed to solve some of this but what is needed there is ability to imposed our own folders or categories based on our knowledge of the subject area. These folders may changed based on the project under development. For instance lets say we are developing a web front end then this is very different than ETL. Our folders or categories will change or should change to help us develop faster and more accurately. As object-oriented development becomes more visual then there is a need for object hiding or encapsulation.

I have always thought with a little work DataStage could replace ERwin. It is a much more elegant solution. The links could represent joins. DataStage already knows how to create DDL for different databases. The change management piece is weak. ERwin understands how to create alter table commands which means they understand what has changed.

As development moves forward then you will see these products merge into one solution like Web Methods or IBM Rational products. How cool would that be in DataStage could do all of this. I hate ERwin anyway. It is a very ugly solution but it has so much momentum. So did Lotus and Wordstar. How many people still use these products?
Mamu Kim
peterbaun
Premium Member
Premium Member
Posts: 93
Joined: Thu Jul 10, 2003 5:27 am
Location: Denmark
Contact:

Post by peterbaun »

Hi -

This is turning into a really interesting discussion and there are some very good points. It seems to me that based on the answers then the graphical job-sequencer is very cool when you are not yet familiar with Basic or don't really need very complex job handling - which I guess is not that surpricing.

Hopefully there is dialogue with Ascential to possibly enhance the graphical sequencers with some of the features mentioned.

Regards
Peter
Post Reply