Time Limits on jobs

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

NEO
Premium Member
Premium Member
Posts: 163
Joined: Mon Mar 22, 2004 5:49 pm

Time Limits on jobs

Post by NEO »

Hi,
I am trying to implement some kind of a timeout feature on ETL jobs. We have a few master sequencers (as in they run other jobs), and each of these master sequence has its own expected average execution time. At times due to numerous factors out of our control, they end up running too long. I would like to stop/abort the jobs if they are running too long. This is a production environment, and this way we get notified if a process aborts. We dont get notified if a process runs too long.
Now I have tried to acheive that by having a waitforfile activity starting independently on the same canvas in the master sequence. This essentially triggers the first job activity in the master sequence along with the wait for file activity. There is a Done file that is created at the end of the master sequence, and the wait for file activity is waiting for this done file. So if I have a 3 hr time out on the wait for file, then if the done file is not created within 3 hrs then the process aborts using an abort stage. This approach seem to work only if there is one job activity in the master sequence. If there are more, the first job activity and the wait for file start simultaneously as expected, but the execution does not move on to the next job activity after the first job activity is complete. It ends up with the first activity complete, and the wait for file activity waiting until it times out. Obviously this results in aborts everytime. Any tweaks or ideas to acheieve timeout ability for jobs?
Thanks folks.
johnthomas
Participant
Posts: 56
Joined: Mon Oct 16, 2006 7:32 am

Post by johnthomas »

Is the next job activity independent in the sequencer ?
JT
I_Server_Whale
Premium Member
Premium Member
Posts: 1255
Joined: Wed Feb 02, 2005 11:54 am
Location: United States of America

Post by I_Server_Whale »

How about this?

Having the Wait_for_file Activity and the Terminator stages connected and independent (not linked to other stages) in the master job sequence.

And have the Wait_for_file activity wait for a dummy file to appear for three hours. So, if your job sequence is still running after 3 hrs, the Wait_for_file activity would fail and then the Terminator stage would send STOP requests to all the activities in your master sequence.

Code: Select all


Same Master Sequence (not linked to below)


Wait_For_File --------> Terminator

Whale.
Anything that won't sell, I don't want to invent. Its sale is proof of utility, and utility is success.
Author: Thomas A. Edison 1847-1931, American Inventor, Entrepreneur, Founder of GE
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Thats exactly what he is doing. It, technically, wont go to the next step as both Job Activity 1 and the Wait for file activity are fired in parallel. It would wait for both of them to finish to go to the next step.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

How about you change your design a little bit.
Design two job sequences.
One will have both your job activitys connected.
Second job sequence will have a wait for file activity connected to Execute Command stage. Your wait for file activity will have the same specs and the execute command stage will have the following command in it

Code: Select all

dsjob -stop <ProjectName> <JobSequence1Name>
Fire both the jobs at once. This way they will be independent of each other and if the timeout triggers, it will send a stop request for the other job sequence.
Thats the only way that i can think of at the moment. But you need to test it thorougly. I dont know how many layers are being controlled by your Master Sequence. So you need to test it. But it will work. Not the best solution but something to fall back upon.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
NEO
Premium Member
Premium Member
Posts: 163
Joined: Mon Mar 22, 2004 5:49 pm

Post by NEO »

The jobs in the master sequence should run in sequence one after the other. So running independently wont work. The only solution that I could come up with so far is the last solution suggested. To run the master sequence as a single job activity in another super master sequence :) with the file wait stage in it. That will do the trick, but its not elegant. I was wondering if there is some way to do it in the master sequencer itself.
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

No No you got it wrong. Your master sequence will have two job activities. Second one dependent upon the first. Say this job is called myMasterSeq. The second sequence job will have wait for file activity connected to the execute command activity stage as i described in my previous post.
Fire both these jobs at the same time as independent jobs. If a time out occurs, the execute command activity will fire, executing the stop command for myMasterSeq. This way you will achieve what you want.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
NEO
Premium Member
Premium Member
Posts: 163
Joined: Mon Mar 22, 2004 5:49 pm

Post by NEO »

That is a good idea too. I think my idea kind of acheives the same purpose. I would prefer to have only master sequencer to deal with. Lot of our master sequencers are also built to be restartable. So having two jobs scheduled to run simultaneously in production seems a little more cumbersome than having only one super master sequence run.
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Ok. Now you throw in the "Restartability" part. Anyways, now you have a few ways. Enjoy :wink:
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
I_Server_Whale
Premium Member
Premium Member
Posts: 1255
Joined: Wed Feb 02, 2005 11:54 am
Location: United States of America

Post by I_Server_Whale »

How about having a 'Execute_Command' stage connected to 'Terminator'.

Code: Select all


Same Master Sequence (not linked to below) 


'Execute_Command'  --------> Terminator 

And run the unix 'SLEEP 10800' command in the 'Execute_Command' stage.

Will that help? Or will Job_Activity_2 still wait on 'SLEEP' to complete?

Whale.
Anything that won't sell, I don't want to invent. Its sale is proof of utility, and utility is success.
Author: Thomas A. Edison 1847-1931, American Inventor, Entrepreneur, Founder of GE
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Yes Mr. Whale it will. Any stage thats fired in parallel will be waited upon to finish before moving forward :wink:
Go ahead, test it for yourself.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
I_Server_Whale
Premium Member
Premium Member
Posts: 1255
Joined: Wed Feb 02, 2005 11:54 am
Location: United States of America

Post by I_Server_Whale »

Arrrrghhhh!!! :) I did suspect/feared so.
Anything that won't sell, I don't want to invent. Its sale is proof of utility, and utility is success.
Author: Thomas A. Edison 1847-1931, American Inventor, Entrepreneur, Founder of GE
NEO
Premium Member
Premium Member
Posts: 163
Joined: Mon Mar 22, 2004 5:49 pm

Post by NEO »

How to abort a job instead of stopping a job from command line. Doesnt look like dsjob has an abort feature. Since the jobs were built to be restartable , a stop will force them to be reset before they can be invoked again.
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

That is correct, no abort feature from command line. If your using a single Master control job then instead of Execute Command Activity, you can use a Routine Activity and call UtilityAbortToLog(). That will force abort the job.
Or if your warning level is set to 1, then you can do something like

Code: Select all

dsjob -log -warn <ProjectName> <JobName>
This will log a warning message which will be picked up by the DSEngine and force abort the job.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
NEO
Premium Member
Premium Member
Posts: 163
Joined: Mon Mar 22, 2004 5:49 pm

Post by NEO »

It has to be from the command line because I am having a timer run in the back ground everytime a job is invoked through unix. This timer will check after certain time to see if the job is still running. If it is , then ideally I would like the timer process to abort the job. This is will get production control's attention and they will call us. Once we fix the problem, and request a re-run, the jobs in the sequencer will start from where it stopped. I guess I can have a round about way of trying to kill a job where I can grep on the process with the job name and issue a kill command in the script, but I would prefer a more DataStage way of killing it, using DataStage provided command line to kill the job instead of stopping the job.
Post Reply