Is there a way to restart a job from point of failure?

vbr_03 · Post by **vbr_03** » Wed Aug 08, 2018 10:27 am

Hi ,

Is there any way to restart a parallel job to load the data from last failure point?

leandrohmvieira · Post by **leandrohmvieira** » Wed Aug 08, 2018 1:39 pm

Sequence jobs does have some checkpoint functionality, which allow a sequence to restart from it.

Parallel Jobs and Server Jobs does not have any features like this. Can you provide some details of your problem?

ray.wurlod · Post by **ray.wurlod** » Thu Aug 09, 2018 12:22 am

Short answer, no.

You may be able to design jobs with a certain degree of restartability but, in general, the amount of effort required would make it not worthwhile.

chulett · Post by **chulett** » Thu Aug 09, 2018 6:37 am

Right, restartable jobs are certainly possible, I've always striven for atomic level job designs ('single units of work') to allow them to be restartable with little or no human intervention. I've posted high level notes here in the past describing the 'framework' we're using now to support that.

Restarting from the point of failure? That's a whole 'nuther kettle of fish, especially if there's any kind of complexity in the job design and would generally require some kid of... let's say "compromises"... with regard to job speed.

(technically, the tool I'm using now has a magical checkbox to enable that functionality but I've yet to try/playwith/trust any such feature)

Joel in KC · Post by **Joel in KC** » Tue Aug 14, 2018 3:12 pm

Please let me know where I can find your framework and and "single unit of work" as we are trying to move to this type of usage, rather than the huge, complex systems that need re-starting,,,appreciate your time. New to the board. Thx again

chulett · Post by **chulett** » Tue Aug 14, 2018 7:24 pm

Both are mentioned here with some high level details for the framework. Hope it helps. As noted there, would really be interested to see if anyone has done anything like that in DataStage, mine is an Informatica implementation which makes it a tad easier.

ray.wurlod · Post by **ray.wurlod** » Tue Aug 14, 2018 8:22 pm

Where I need this functionality I, like Craig, create small atomic units of work as DataStage job, and make use of the restartability capability of sequence jobs to handle that. No point in re-inventing the wheel.

FranklinE · Post by **FranklinE** » Wed Aug 15, 2018 8:52 am

High-level error handling design is where restartability is identified. Error handling is a part of the definition of the unit of work.

Example:

1. Download file. If that fails, fix problem and rerun.
2. Process file. If there are no intermediate points of failure -- like commits -- if the process fails fix and rerun.
3. Etc.

DataStage permits jobs that do both functions in one parallel job. If your design does that, you're next step is to rewrite the job to create the separate units of work.

Job Sequence design covers the how and where.