Checkpoint restart

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

benny.lbs
Participant
Posts: 125
Joined: Wed Feb 23, 2005 3:46 am

Checkpoint restart

Post by benny.lbs »

As you may known, there is a feature called "Checkpoint restarted" on DS Job Sequence, however, it seems that it can only support one level.

My case as below.

Top level Seq000 : contain two sequence
Seq001
Seq002

Second level :
Seq001 contail three jobs
Job001
Job002
Job003

Seq002 contail three jobs
Job004
Job005
Job006

If Seq001 normal end and Job005 aborted, then Seq002 and Seq000 aborted. According to checkpoint restart feature, it should be restarted from Job005, however, it can only restart from Seq002 ---> Job004 (not Job005).

Is it a well known case ? Does anyone know will it be supported for two and above level in later version released ?
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

What version of DataStage because this works fine in Ds7.1 and above.
Mamu Kim
benny.lbs
Participant
Posts: 125
Joined: Wed Feb 23, 2005 3:46 am

Post by benny.lbs »

No, I am working on DS7.1, but failed
kduke wrote:What version of DataStage because this works fine in Ds7.1 and above.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Looks like you only have Checkpoint restart enabled on the top level job sequence. What happens if you enable it on lower levels also?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
benny.lbs
Participant
Posts: 125
Joined: Wed Feb 23, 2005 3:46 am

Post by benny.lbs »

yes, all lower level also enable Checkpoint restart
ray.wurlod wrote:Looks like you only have Checkpoint restart enabled on the top level job sequence. What happens if you enable it on lower levels also?
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Hi,
Do you see the jobs getting reset before running again?
When they are in aborted state do you see only reset option in director for those sequence jobs?
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
benny.lbs
Participant
Posts: 125
Joined: Wed Feb 23, 2005 3:46 am

Post by benny.lbs »

Yes, the aborted job can be reset automatically, because I have set the option "Reset if require, then run"
roy wrote:Hi,
Do you see the jobs getting reset before running again?
When they are in aborted state do you see only reset option in director for those sequence jobs?
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Ok and do the log shows they have been reset when you rerun the top level sequence job?
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
benny.lbs
Participant
Posts: 125
Joined: Wed Feb 23, 2005 3:46 am

Post by benny.lbs »

oh, yes, I know what is happening, the sub level sequence has been reset, so that it start from the beginning.

That means for sub level sequence we should set the option "Run" instead of "Reset if required, then run", right ?

roy wrote:Ok and do the log shows they have been reset when you rerun the top level sequence job?
benny.lbs
Participant
Posts: 125
Joined: Wed Feb 23, 2005 3:46 am

Post by benny.lbs »

However, I have encountered another case

Options:
Add Checkpoints so sequence is restartable on failture (ON)
Automatically handle activity that fail (ON)

In my previous case, I have used "Exception Handler" and "Routine Activity" (call DSJobAbort) to force the sequence aborted if any job aborted. In this case, the Status of the sub-level sequence is Aborted/Restartable, so no matter what option I have set in top-level sequence (say "Run" or "Reset if required, then run"), when re-running top-level, it will reset the aborted sub-level.

If I don't use the "Exception Handler" to force it aborted, then the sub-level 's status is Finished/Restartable (it stopped at the aborted job). However, the top-level will not detect the abnormal case from sub-level and will continue to process the next Job Activity in top-level.

That is the point. If they have no dependence, that is fine, but most of the case, we have.

Now supposed the top-level finished normally (supposed they have no dependence) When I re-run the top-level, it will start from the beginning, during re-running the abnormal sub-level, it will not be reseted (because the previous status is Finished/Restartable) and start from the aborted job. But that isn't what I wanted, because top-level start from the beginning.

Grateful if anyone can give me some lights on the Checkpoint Restart usage.

Many many thanks !

Regards,
Benny

benny.lbs wrote:oh, yes, I know what is happening, the sub level sequence has been reset, so that it start from the beginning.

That means for sub level sequence we should set the option "Run" instead of "Reset if required, then run", right ?

roy wrote:Ok and do the log shows they have been reset when you rerun the top level sequence job?
:idea:
DSLover
Participant
Posts: 1
Joined: Tue Jul 05, 2005 8:18 am

Post by DSLover »

I am a DS beginner, I have also encountered the following case, anyone can help ?

Thanks in advance!
benny.lbs wrote:However, I have encountered another case

Options:
Add Checkpoints so sequence is restartable on failture (ON)
Automatically handle activity that fail (ON)

In my previous case, I have used "Exception Handler" and "Routine Activity" (call DSJobAbort) to force the sequence aborted if any job aborted. In this case, the Status of the sub-level sequence is Aborted/Restartable, so no matter what option I have set in top-level sequence (say "Run" or "Reset if required, then run"), when re-running top-level, it will reset the aborted sub-level.

If I don't use the "Exception Handler" to force it aborted, then the sub-level 's status is Finished/Restartable (it stopped at the aborted job). However, the top-level will not detect the abnormal case from sub-level and will continue to process the next Job Activity in top-level.

That is the point. If they have no dependence, that is fine, but most of the case, we have.

Now supposed the top-level finished normally (supposed they have no dependence) When I re-run the top-level, it will start from the beginning, during re-running the abnormal sub-level, it will not be reseted (because the previous status is Finished/Restartable) and start from the aborted job. But that isn't what I wanted, because top-level start from the beginning.

Grateful if anyone can give me some lights on the Checkpoint Restart usage.

Many many thanks !

Regards,
Benny

benny.lbs wrote:oh, yes, I know what is happening, the sub level sequence has been reset, so that it start from the beginning.

That means for sub level sequence we should set the option "Run" instead of "Reset if required, then run", right ?

roy wrote:Ok and do the log shows they have been reset when you rerun the top level sequence job?
:idea:
vcannadevula
Charter Member
Charter Member
Posts: 143
Joined: Thu Nov 04, 2004 6:53 am

Post by vcannadevula »

benny.lbs wrote:However, I have encountered another case

Options:
Add Checkpoints so sequence is restartable on failture (ON)
Automatically handle activity that fail (ON)

In my previous case, I have used "Exception Handler" and "Routine Activity" (call DSJobAbort) to force the sequence aborted if any job aborted. In this case, the Status of the sub-level sequence is Aborted/Restartable, so no matter what option I have set in top-level sequence (say "Run" or "Reset if required, then run"), when re-running top-level, it will reset the aborted sub-level.

If I don't use the "Exception Handler" to force it aborted, then the sub-level 's status is Finished/Restartable (it stopped at the aborted job). However, the top-level will not detect the abnormal case from sub-level and will continue to process the next Job Activity in top-level.

That is the point. If they have no dependence, that is fine, but most of the case, we have.

Now supposed the top-level finished normally (supposed they have no dependence) When I re-run the top-level, it will start from the beginning, during re-running the abnormal sub-level, it will not be reseted (because the previous status is Finished/Restartable) and start from the aborted job. But that isn't what I wanted, because top-level start from the beginning.

Grateful if anyone can give me some lights on the Checkpoint Restart usage.

Many many thanks !

Regards,
Benny

benny.lbs wrote:oh, yes, I know what is happening, the sub level sequence has been reset, so that it start from the beginning.

That means for sub level sequence we should set the option "Run" instead of "Reset if required, then run", right ?

roy wrote:Ok and do the log shows they have been reset when you rerun the top level sequence job?
:idea:


When you use the check point and restart, the code should be designed in such a way that any abort at the bottom level should bubble all the way up. This is good with those options. Another catch here is, when you make your jobs restartable and set the ""Reset if required, then run" , the restart will not work. When you go for ""Reset if required, then run" this option , when there is an abort even the checkpoint will be erased.

So, use the restartablility option with "Run" option.
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

When you go for ""Reset if required, then run" this option , when there is an abort even the checkpoint will be erased.
This is not true. We do this all the time. It should only run the jobs which have not run or aborted when you restart a sequence. This works fine. We use it in all our sequences. All our jobs "Reset if required, then run".
Mamu Kim
benny.lbs
Participant
Posts: 125
Joined: Wed Feb 23, 2005 3:46 am

Post by benny.lbs »

kduke,

Actually, what I encountered is the checkpoint was erased. I am getting puzzle for a long time.
kduke wrote:
When you go for ""Reset if required, then run" this option , when there is an abort even the checkpoint will be erased.
This is not true. We do this all the time. It should only run the jobs which have not run or aborted when you restart a sequence. This works fine. We use it in all our sequences. All our jobs "Reset if required, then run".
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

This works for me. I am not sure what is the difference between what you are doing and what we do but I expect something is set differently.
Mamu Kim
Post Reply