Time Limits on jobs
Moderators: chulett, rschirm, roy
Time Limits on jobs
Hi,
I am trying to implement some kind of a timeout feature on ETL jobs. We have a few master sequencers (as in they run other jobs), and each of these master sequence has its own expected average execution time. At times due to numerous factors out of our control, they end up running too long. I would like to stop/abort the jobs if they are running too long. This is a production environment, and this way we get notified if a process aborts. We dont get notified if a process runs too long.
Now I have tried to acheive that by having a waitforfile activity starting independently on the same canvas in the master sequence. This essentially triggers the first job activity in the master sequence along with the wait for file activity. There is a Done file that is created at the end of the master sequence, and the wait for file activity is waiting for this done file. So if I have a 3 hr time out on the wait for file, then if the done file is not created within 3 hrs then the process aborts using an abort stage. This approach seem to work only if there is one job activity in the master sequence. If there are more, the first job activity and the wait for file start simultaneously as expected, but the execution does not move on to the next job activity after the first job activity is complete. It ends up with the first activity complete, and the wait for file activity waiting until it times out. Obviously this results in aborts everytime. Any tweaks or ideas to acheieve timeout ability for jobs?
Thanks folks.
I am trying to implement some kind of a timeout feature on ETL jobs. We have a few master sequencers (as in they run other jobs), and each of these master sequence has its own expected average execution time. At times due to numerous factors out of our control, they end up running too long. I would like to stop/abort the jobs if they are running too long. This is a production environment, and this way we get notified if a process aborts. We dont get notified if a process runs too long.
Now I have tried to acheive that by having a waitforfile activity starting independently on the same canvas in the master sequence. This essentially triggers the first job activity in the master sequence along with the wait for file activity. There is a Done file that is created at the end of the master sequence, and the wait for file activity is waiting for this done file. So if I have a 3 hr time out on the wait for file, then if the done file is not created within 3 hrs then the process aborts using an abort stage. This approach seem to work only if there is one job activity in the master sequence. If there are more, the first job activity and the wait for file start simultaneously as expected, but the execution does not move on to the next job activity after the first job activity is complete. It ends up with the first activity complete, and the wait for file activity waiting until it times out. Obviously this results in aborts everytime. Any tweaks or ideas to acheieve timeout ability for jobs?
Thanks folks.
-
- Participant
- Posts: 56
- Joined: Mon Oct 16, 2006 7:32 am
-
- Premium Member
- Posts: 1255
- Joined: Wed Feb 02, 2005 11:54 am
- Location: United States of America
How about this?
Having the Wait_for_file Activity and the Terminator stages connected and independent (not linked to other stages) in the master job sequence.
And have the Wait_for_file activity wait for a dummy file to appear for three hours. So, if your job sequence is still running after 3 hrs, the Wait_for_file activity would fail and then the Terminator stage would send STOP requests to all the activities in your master sequence.
Whale.
Having the Wait_for_file Activity and the Terminator stages connected and independent (not linked to other stages) in the master job sequence.
And have the Wait_for_file activity wait for a dummy file to appear for three hours. So, if your job sequence is still running after 3 hrs, the Wait_for_file activity would fail and then the Terminator stage would send STOP requests to all the activities in your master sequence.
Code: Select all
Same Master Sequence (not linked to below)
Wait_For_File --------> Terminator
Anything that won't sell, I don't want to invent. Its sale is proof of utility, and utility is success.
Author: Thomas A. Edison 1847-1931, American Inventor, Entrepreneur, Founder of GE
Author: Thomas A. Edison 1847-1931, American Inventor, Entrepreneur, Founder of GE
Thats exactly what he is doing. It, technically, wont go to the next step as both Job Activity 1 and the Wait for file activity are fired in parallel. It would wait for both of them to finish to go to the next step.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
How about you change your design a little bit.
Design two job sequences.
One will have both your job activitys connected.
Second job sequence will have a wait for file activity connected to Execute Command stage. Your wait for file activity will have the same specs and the execute command stage will have the following command in it
Fire both the jobs at once. This way they will be independent of each other and if the timeout triggers, it will send a stop request for the other job sequence.
Thats the only way that i can think of at the moment. But you need to test it thorougly. I dont know how many layers are being controlled by your Master Sequence. So you need to test it. But it will work. Not the best solution but something to fall back upon.
Design two job sequences.
One will have both your job activitys connected.
Second job sequence will have a wait for file activity connected to Execute Command stage. Your wait for file activity will have the same specs and the execute command stage will have the following command in it
Code: Select all
dsjob -stop <ProjectName> <JobSequence1Name>
Thats the only way that i can think of at the moment. But you need to test it thorougly. I dont know how many layers are being controlled by your Master Sequence. So you need to test it. But it will work. Not the best solution but something to fall back upon.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
The jobs in the master sequence should run in sequence one after the other. So running independently wont work. The only solution that I could come up with so far is the last solution suggested. To run the master sequence as a single job activity in another super master sequence with the file wait stage in it. That will do the trick, but its not elegant. I was wondering if there is some way to do it in the master sequencer itself.
No No you got it wrong. Your master sequence will have two job activities. Second one dependent upon the first. Say this job is called myMasterSeq. The second sequence job will have wait for file activity connected to the execute command activity stage as i described in my previous post.
Fire both these jobs at the same time as independent jobs. If a time out occurs, the execute command activity will fire, executing the stop command for myMasterSeq. This way you will achieve what you want.
Fire both these jobs at the same time as independent jobs. If a time out occurs, the execute command activity will fire, executing the stop command for myMasterSeq. This way you will achieve what you want.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
That is a good idea too. I think my idea kind of acheives the same purpose. I would prefer to have only master sequencer to deal with. Lot of our master sequencers are also built to be restartable. So having two jobs scheduled to run simultaneously in production seems a little more cumbersome than having only one super master sequence run.
-
- Premium Member
- Posts: 1255
- Joined: Wed Feb 02, 2005 11:54 am
- Location: United States of America
How about having a 'Execute_Command' stage connected to 'Terminator'.
And run the unix 'SLEEP 10800' command in the 'Execute_Command' stage.
Will that help? Or will Job_Activity_2 still wait on 'SLEEP' to complete?
Whale.
Code: Select all
Same Master Sequence (not linked to below)
'Execute_Command' --------> Terminator
Will that help? Or will Job_Activity_2 still wait on 'SLEEP' to complete?
Whale.
Anything that won't sell, I don't want to invent. Its sale is proof of utility, and utility is success.
Author: Thomas A. Edison 1847-1931, American Inventor, Entrepreneur, Founder of GE
Author: Thomas A. Edison 1847-1931, American Inventor, Entrepreneur, Founder of GE
-
- Premium Member
- Posts: 1255
- Joined: Wed Feb 02, 2005 11:54 am
- Location: United States of America
That is correct, no abort feature from command line. If your using a single Master control job then instead of Execute Command Activity, you can use a Routine Activity and call UtilityAbortToLog(). That will force abort the job.
Or if your warning level is set to 1, then you can do something like
This will log a warning message which will be picked up by the DSEngine and force abort the job.
Or if your warning level is set to 1, then you can do something like
Code: Select all
dsjob -log -warn <ProjectName> <JobName>
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
It has to be from the command line because I am having a timer run in the back ground everytime a job is invoked through unix. This timer will check after certain time to see if the job is still running. If it is , then ideally I would like the timer process to abort the job. This is will get production control's attention and they will call us. Once we fix the problem, and request a re-run, the jobs in the sequencer will start from where it stopped. I guess I can have a round about way of trying to kill a job where I can grep on the process with the job name and issue a kill command in the script, but I would prefer a more DataStage way of killing it, using DataStage provided command line to kill the job instead of stopping the job.