Sequence - Checkpoints - Restartability

nvalia · Post by **nvalia** » Tue Oct 07, 2014 4:42 am

Hi All,
Designing the ETL cycle for Restartability.

The Main Sequence (that controls the end to end ETL cycle dependency) job will be invoked using a Wrapper script that does initial validation of the state of the job and resets accordingly and generates logs file.
One Child Sequence job will control all Staging jobs and another Child Sequence job the loading of from Staging to the Star Schema. Additional jobs as needed for Pre and Post processes.

I understand we can use Checkpoints for Restart ability in case of job failure so the Sequence Restarts from the Aborted job onward and not from the start

What are the Pros and Cons of using Check Points approach, like does it impact performance in any way or anything else we should be aware of?

Other option would be a Metadata Driven Approach where all job names are in a process table and before every run, check the status if Success/Failure (via a Flag in the table - Default 'N') for that job and proceed accordingly?
But this means additional design and build of scripts and complexity to the process

Anyone implemented this approach and if you could share your experience

Thanks,
NV

chulett · Post by **chulett** » Tue Oct 07, 2014 5:55 am

There are no "performance impacts" for using checkpoints in a sequence job. And I can't imagine the need to re-invent the wheel or the additional complexity that would add when almost literally all you have to do is check a box.

Everything will automatically get a checkpoint but you do have the option to say "Do not checkpoint" for any task that would always need to run regardless of the overall status. In case of errors in the sequence job, make sure it aborts rather than just stops, that's what "activates" the checkpoints in a manner of speaking such that your final job status is "Aborted Restartable" rather than simply "Aborted". And then you can either reset the job to start over from the beginning or simply run it and it will restart at the failure point as you noted.

FranklinE · Post by **FranklinE** » Tue Oct 07, 2014 9:04 am

Our design adds just one thing to Craig's important call for simplicity: every checkpoint sequence stage has a terminator activity on the abort condition link as the "exit" point. It lets you add a final message text but more importantly lets you control other processes that may be running.

About restartability: you mention child sequences. I don't think that's a good design decision myself, because it adds an extra layer to everything and makes the Director output more complex. But the main point is that it automatically resets parallel jobs under the job activity of the abort, making your wrapper script query of job statuses redundant. The checkpoint automatically sess the abort status and issues a reset prior to restarting the job.

Edit: we use looping in our job sequences. A checkpoint in the loop also restarts on the aborted iteration of the loop. We find this to be very beneficial.

nvalia · Post by **nvalia** » Tue Oct 07, 2014 9:34 am

Thank you Chulett and Franklin for the detailed response.

Franklin, I was thinking of Child Sequence only in context of NON Check Point Approach. But based on the comments I will got above I will definately go with the Check Point approach.

FranklinE · Post by **FranklinE** » Tue Oct 07, 2014 9:39 am

You're welcome. The best thing about DataStage is that it can do so many things. The worst thing about DataStage is that it lets you do too many things.

chulett · Post by **chulett** » Tue Oct 07, 2014 11:23 am

Make sure your child sequences abort as well when there's a problem. That will be communicated upstream so your main sequence can be aborted as well. Then a restart will find its way back down the rabbit hole to where it needs to restart no matter how deep it needs to go.

ray.wurlod · Post by **ray.wurlod** » Tue Oct 07, 2014 3:34 pm

An alternative to Craig's most recent suggestion is to make sure that your sub-sequences throw a warning (but do not need to abort). This, too, can be detected upstream (set parent sequences to log a warning if any activity does not finish with a status of OK).

If you really want an esoteric solution, the sub-sequence can (through a Routine activity) log a warning in its controller's log.

chulett · Post by **chulett** » Tue Oct 07, 2014 3:42 pm

I was assuming the child / sub-sequences were checkpointed as well...