Page 1 of 1

Restarting aborted Sequence jobs with Loop stages

Posted: Tue Feb 20, 2007 12:42 am
by chulett
There have been some questions here regarding checkpointed Sequence jobs with Loop stages and when happens when it aborts deep inside the loop. I had the opportunity to witness this and thought I'd pass along my experience.

Complex Sequence job with stages both before and after the Loop start/end stages. Lots of things happening inside the loop as well - routines, Server jobs, Sequence jobs, branching paths, all kinds of fun. The 'master' Sequence is checkpointed and so are the two Sequence jobs running inside the loop. Loop runs as a 'Numeric Loop' and iterates from 1 to 288 (max) on a typical day.

Today it cranked up and made 64 successful iterations before cratering on the 65th iteration due to database issues inside one of the 'sub sequence' jobs that it runs. Upon restart I watched as it skipped the completed checkpoints and entered the loop. It then proceeded to skip through the entire loop and circle back around as fast as it could write log entries. Started to worry as it logged stuff like mad and did this hard loop up to iteration 63, 64 and then it finally hit 65. Once inside the 65th iteration of the loop, it skipped up to the failed job in the failed sequence then picked up where it left off, running like normal from that point forward.

Pretty cool. 8)

Posted: Tue Feb 20, 2007 4:00 am
by ray.wurlod
Someone is going to need to hack to discover how this works. It's clearly not a simple checkpoint mechanism when a loop is involved.

Posted: Tue Feb 20, 2007 8:20 am
by chulett
Not sure if this is significant or not, but I do have the ECASE 62595 patch installed which helps avoid a 'stack overflow' problem with long running loops. FYI.