Job Sequence vs Scheduler

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply
shivadrash
Participant
Posts: 8
Joined: Mon Jun 13, 2011 2:55 am
Location: india

Job Sequence vs Scheduler

Post by shivadrash »

Hi,

Though there are forums which compare scheduler vs sequence job, I wish to revisit the question again to understand more on this. We are currently building a Center of Excellence Model in our organization and the decision point will impact a huge no. of audience.

There are recommendations to schedule and control the flow of parallel jobs in a Scheduler like Control M rather than calling a Job Sequence in Datastage via Control M.

Let us consider 2 scenarios
a. the flow consist of 5 to 10 parallel jobs
b. the flow consist of 100 to 200 parallel jobs.

Based on the above scenarios, please advice me which approach is recommend?

either Multiple Parallel jobs controlled by a scheduler or a Single Sequence called out using the scheduler
Regards
Sivanandha
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

*drops 2 cents into cup*

I would say that you want a mixture of both depending on the situation and business need.

Given that your organization is goign to run 100 jobs in parallel, you have to understand WHY they are seperate jobs. Are they seperated because of Country codes? Business Unit? Or simply by some 0-100, 101-200, 201-300, etc... factor.

If you decide to HALT the processing of a Business unit, but let everything else run... then do it from your scheduler.

If your quantity of jobs is dynamic based upon some factor determined at run time, then you'd probably favor a sequencer.

You're going to be hard pressed to get any concrete answer given to you because your COE policies MUST be based upon requirements from your company needs. Most of us out here are not part of your company. :)


Think of how you want to control your flow.
Think of how you may want to THROTLE your flow as well.
Taking a known outage on your main corporate database? You might want to hold your jobs loading that database at the scheduler level rather than have them execute and fail at a job level because the system is down.

If you use an External Scheduler, then you have the ability to be notified that your environment is down because of a failed job submittion. Using the built in scheduler ... not so much since the whole server might be hosed to begin with.

Good luck, remember to throw your worst case scenarios to your environment and see how you want to react to it. That will help define your COE policies.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Somewhat of an apples and oranges comparison to me as you'll always need a scheduler of some sort but not necessarily Sequence jobs. And, as Paul notes, there really is no "one size fits all" answer here.

Keep in mind the fact that Sequence jobs allow something no "external" scheduler can - the direct passing of information from one object in the flow to another - so if an installation has made heavy use of that functionality, there really isn't any way to break those up and run them via some other mechanism at an atomic level.
-craig

"You can never have too many knives" -- Logan Nine Fingers
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

Most of my clients use a mixture of both based on the need for the particular job stream. Trying to standardize on one and exclude the other would not be efficient or effective, since they both have different capabilities.

In general:
Use Corporate Scheduler to:
- Integrate ETL Job flows with Non-ETL job flows (SQL, etc.)
- Provide overall start / monitoring of ETL sequences.
- Corporate-level notifications for aborts and issues
- Waiting on dropped files to trigger executions
- Start / stop overall job stream based on external events (host down, database down, etc.)

The corporate scheduler may sometimes execute jobs directly, but more frequently it is used to execute job sequences that:
- Provide the ability to reset and restart checkpointed job sequences
- Provide complex ETL sequence architectures (loops, conditional execution)
- Automate distribution of ETL reject files
- Provide easy access to sub-modules (via sub-sequences) for targeted executions or recovery operations.
- ETL developer level notifications of aborts (with detailed status codes)
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
shivadrash
Participant
Posts: 8
Joined: Mon Jun 13, 2011 2:55 am
Location: india

Post by shivadrash »

thanks much for all your comments. this is more information and i would continue refering to this page to build the CoE Capabilities.

It would be really hard to cut out a concrete reason. but yeah we can come up with the guidelines. :)
Regards
Sivanandha
Post Reply