Job Sequencer restart/recover best practice

idocrm · Post by **idocrm** » Thu Jul 03, 2003 9:19 am

Hi all,
Does anyone know if there is a set of best practice for using job sequencer, especially when it comes to restart/recover? If I have 10 jobs running either in pararell or sequentially, some of them ran ok and some of them failed. How can I restar by just running the ones that failed. Thanks.

kduke · Post by **kduke** » Thu Jul 03, 2003 9:39 am

idocrm

Most of us break up job sequences into groups of jobs which can be rerun without a problem like code files and jobs which may be difficult to rerun like fact table jobs. DataStage does not store job history so you can write something which would understand where a sequence failed and restart it at that point. Developers usually have done this in batch jobs and it is very complex to automate. Until Ascential has a methodology to do this then we will all write our own methods. Ascential does give you a lot of routines which will return the job status. If you have large data sets then you need to manually calculate where it failed to not update or insert those rows again. Most of the time this is not a problem unless you are aggregating data then you have serious problems.

Kim.

ray.wurlod · Post by **ray.wurlod** » Thu Jul 03, 2003 4:42 pm

Restartability always carries with it a burden of needing to stage data on disk. You have to design for this, since DataStage is intended to keep data in memory as much as possible (for speed).
Best practice has evolved through people's experience; I think most would agree that a hierarchy of control jobs (job sequences) is the easiest way to accomplish restartability. I have implemented a number of these.
I disagree with the advice to abort; I prefer to log warnings and other restart status information, so that recovery can be 100% automatic. However, this does require customizing the code generated when a job sequence is compiled.
In fact, my control jobs never abort. I use DSJ.ERRNONE as the second argument for DSAttachJob, and never call DSLogFatal. I never use ABORT or STOP statements. I never return non-zero codes from before/after subroutines (instead I pass results and status by other means).
As to your particular question, the actual recovery would depend on the rules you create. For example, if some fail do you need to rollback the lot and re-run the lot, or is it sufficient to re-run the failed jobs? Once you've decided upon these rules, simply construct recovery jobs and/or control jobs (or adapt existing control jobs) to achieve what your rules specify.

Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518

dickfong · Post by **dickfong** » Sun Jul 06, 2003 8:38 pm

I would like to explore more on the exception handling

In our project, we have implemented a restart machanism similar to what Mike described. We have to abort the job sequencers in case of job abort/warning because if incorrect data goes into the target database, it would take us even longer time to resume the operation.

However, we would not abort our topest level sequencer because as what Ray said, it would be easier to automate the restart. So we record the runtime status/information in a hash file so that in each of the stepping point, we can determine if the batch should continue or stop for fixing.

One difficulties of the implementation is to abort the lower level seqencers in case of job abort, to do that we have to add tails (fail and warning) to each of the job so that it can log failure information to the status hash.

By doing this, the graphical sequencer becomes very complicated and dramatically decrease the readability (as we have 500+ jobs in 30+ sequencers)

Does anyone have any suggestions in improving this? Thanks in advance.

Regards,
Dick

kcbland · Post by **kcbland** » Sun Jul 06, 2003 10:14 pm

I have totally avoided using the job sequencer. When you talk about large number of small, modular jobs in a given ETL application that run as a set, the number of sequencers required to modularize the job flow becomes self defeating. If you're trying to use a sequencer to more readily visualize the process flow, you end up having to sacrifice modularity in order to squeeze all of the sequencers into one job. By the time you have zoomed out you have the proverbial big black dot.

The ultimate conclusion takes you to the need to have a job control mechanism that looks a lot like Microsoft Project. If you've ever worked with Project, you know the elegance behind constructing graphical dependency trees for hundreds of tasks. It's unfortunate that the iconic metaphor was chosen for the Sequencer, because it just does not scale.

My approach was to develop a job control library that reads a simple depencency matrix, which greatly extends the ability to manage hundreds of jobs in a single process. Once you have a dependency tree, then you can have a job control process manage the execution of the jobs, track completed jobs and waiting jobs. You control the absolute level of restart capabilities, milestone tracking, etc. You also gain the ability to start and stop at milestones, etc. You can customize parameter value assignments to each jobs needs. (See a recent posting I just did for reading parameters from a file and setting them in a job at runtime.) This is all very simple stuff once you get over wanting to use the sequencer.

The only challenge becomes maintaining the dependency tree. One of my clients keeps the tree in an Oracle table. They love that approach because it allows full metadata exposure as to the process execution flow. An Ascential consultant I know actually built a Microsoft Project VB app to allow him to use Project to maintain the dependency tree and write out the simple dependency matrix. The matrix is just a simple Excel style spreadsheet listing jobs and a space separated list of immediate predecessor jobs.

My ultimate point is to realize that DataStage has a wonderful API library of job control functions. Be not afraid! Dive in and discover the power of the underlying BASIC language to create your own job control, tailored to your needs.

Good luck!

Kenneth Bland

Viswanath · Post by **Viswanath** » Mon Jul 07, 2003 12:06 am

Hi Mike,
Could you explain more on the Tags? What do you actually mean by Tags? How do we incorporate this in teh sequencer jobs?

Viswanath.S

Viswanath · Post by **Viswanath** » Mon Jul 07, 2003 7:58 am

Mike,
Thanks for the reply. Is there a way to take care restartability using DataStage itself, rather than using Ctrl-M? I worked on the mainframe part of DataStage, and we used Ctrl-M as the sequencer, as OS390 version did not provide any type of running or sequencing of the jobs. Now we have a requirement where we have to transfer data from Oracle to a Oracle datawarehouse. Can Ctrl-M be used here? If i still need to take care of restartability and recovery using DataStage only is that possible?

Regards,

Viswanath.S