Sequence - Checkpoints - Restartability
Posted: Tue Oct 07, 2014 4:42 am
Hi All,
Designing the ETL cycle for Restartability.
The Main Sequence (that controls the end to end ETL cycle dependency) job will be invoked using a Wrapper script that does initial validation of the state of the job and resets accordingly and generates logs file.
One Child Sequence job will control all Staging jobs and another Child Sequence job the loading of from Staging to the Star Schema. Additional jobs as needed for Pre and Post processes.
I understand we can use Checkpoints for Restart ability in case of job failure so the Sequence Restarts from the Aborted job onward and not from the start
What are the Pros and Cons of using Check Points approach, like does it impact performance in any way or anything else we should be aware of?
Other option would be a Metadata Driven Approach where all job names are in a process table and before every run, check the status if Success/Failure (via a Flag in the table - Default 'N') for that job and proceed accordingly?
But this means additional design and build of scripts and complexity to the process
Anyone implemented this approach and if you could share your experience
Thanks,
NV
Designing the ETL cycle for Restartability.
The Main Sequence (that controls the end to end ETL cycle dependency) job will be invoked using a Wrapper script that does initial validation of the state of the job and resets accordingly and generates logs file.
One Child Sequence job will control all Staging jobs and another Child Sequence job the loading of from Staging to the Star Schema. Additional jobs as needed for Pre and Post processes.
I understand we can use Checkpoints for Restart ability in case of job failure so the Sequence Restarts from the Aborted job onward and not from the start
What are the Pros and Cons of using Check Points approach, like does it impact performance in any way or anything else we should be aware of?
Other option would be a Metadata Driven Approach where all job names are in a process table and before every run, check the status if Success/Failure (via a Flag in the table - Default 'N') for that job and proceed accordingly?
But this means additional design and build of scripts and complexity to the process
Anyone implemented this approach and if you could share your experience
Thanks,
NV