Routine used to detect runaway jobs.

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply
michaeld
Premium Member
Premium Member
Posts: 88
Joined: Tue Apr 04, 2006 8:42 am
Location: Toronto, Canada

Routine used to detect runaway jobs.

Post by michaeld »

If you're running DataStage on Windows then you must be quite familiar with runaway jobs. These are jobs that don't complete. They just keep running even though they are not doing any processing. Most of the time this happens because of bad design or limited resources. To deal with this I created a routine that lists runaway jobs. The suggested way of using it is to create a sequence job that takes the output of this routine and notifies an admin when the list is not empty. The sequence job would run every 15 min (or whatever interval you want). The job judges the start time based on the starting time of the job. I did not make it based on the last update to the log in order to make it as lightweight as possible.


Code: Select all

***************************************************
* Name:   ListRunawayJobs
* Author: Michael Dann (2007)
* Description: List jobs that have been running for
*              over a given number of minutes.
*
* Args:
* aMaxRunningTime - The max running time before 
*                   you consider it a runaway job 
*                   (in minutes)
*
* Return: comma delimited list of runaway jobs
***************************************************

$INCLUDE DSINCLUDE JOBCONTROL.H      
      
* initiliase variables
vListSize = 0 ; vOut = ""; i = 0 ; Ans=""

* Get the current time and date in internal format (seconds since 1970)
CALL !TIMDAT(timeArr)
vSystemDate=Iconv("20":timeArr<3>:"-":timeArr<1>:"-":timeArr<2>,"D-YMD[4,2,2]")
vSystemTime=timeArr<4>*60

* Get list of job in this project
vJobList=DSGetProjectInfo(DSJ.JOBLIST)

* Get number of items in list
vListSize = DCount(vJobList, ",")

*For each job check if it is running
IF vListSize > 0 THEN
   FOR i = 1 to vListSize
      vListItem = Field(vJobList, ",", i)
      *Get handle for job
      vHandle=DSAttachJob(vListItem,DSJ.ERRNONE)
      
      IF vHandle<>DSJE.BADHANDLE THEN
         *Check the status to see if it is running
         vStatus=DSGetJobInfo(vHandle,DSJ.JOBSTATUS)
         IF vStatus=DSJS.RUNNING THEN

            *Get the job start timestamp
            vJobStartTimeStamp=DSGetJobInfo(vHandle,DSJ.JOBSTARTTIMESTAMP)

            *Convert the timestamp to seconds since 1970 (internal format)
            TimePart = MatchField(vJobStartTimeStamp,"4N'-'2N'-'2N' '0X",7)
            vJobStartDate=Iconv(left(vJobStartTimeStamp,10),"D-YMD[4,2,2]")
            vJobStartTime=Iconv(TimePart,"MTS:")

            *Calculate difference from current time
            vDiff=(vSystemDate-vJobStartDate)*86400+(vSystemTime-vJobStartTime)
            vDiff=vDiff/60
            
            *Add to list of runaway jobs if it has been running for longer than [aMaxRunningTime]
            IF vDiff>aMaxRunningTime THEN 
               vOut=vOut : vListItem:" (":INT(vDiff):" minutes),"
            END
            
         END
         *Close handle for job
         vHandle=DSDetachJob(vHandle)
      END
   NEXT
END

*Output list 
IF vOut<>"" THEN
   Ans=LEFT(vOut,LEN(vOut)-1)
END
Mike
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Some observations.

1. Internal time is not seconds since 1970.

2. Current date is more easily returned by Date(), current time by Time(). Current timestamp in seconds (since 1967-12-31) is simply

Code: Select all

86400 * Date() + Time()
3. You do not need to preprocess using DCount. Instead of a For loop, use an uncounted loop and process it using REMOVE.

4. Similarly, job start timestamp in seconds since 1967-12-31 is

Code: Select all

86400 * Iconv(Field(vJobStartTimestamp," ",1,1),"DYMD") + Iconv(Field(vJobStartTimestamp, " ", 2, 1), "MTS")
5. Make use of the <-1> notation to build a dynamic array in Ans. For example

Code: Select all

Ans<-1> = vListItem:" (":INT(vDiff):" minutes)" 
In this way you don't need vOut and don't need to strip the trailing delimiter.

Stripped of your excellent documentation, the following code fragment illustrates the technique.

Code: Select all

$IFNDEF JOBCONTROL.H
$INCLUDE DSINCLUDE JOBCONTROL.H
$ENDIF
Ans = ""
vTimeNow = 86400 * Date() + Time()
vJobList = Convert(',", @FM, DSGetProjectInfo(DSJ.JOBLIST))
Loop
   Remove vListItem From vJobList Setting bMoreJobs
   hJob = DSAttachJob(vListItem, DSJ.ERRNONE)
   vStatus = DSGetJobInfo(hJob, DSJ.JOBSTATUS)
   If vStatus = DSJS.RUNNING
   Then
      vJobStartTimestamp = DSGetJobInfo(hJob, DSJ.JOBSTARTTIMESTAMP)
      iJobStartTimestamp = 86400 * Iconv(Field(vJobStartTimestamp, " ", 1, 1), "DYMD") + Iconv(Field(vJobStartTimestamp, " ", 2, 1), "MTS")
      vDiff = (vTimeNow - iJobStartTimeStamp) / 60  ; * minutes
      If vDiff >= aMaxRunningTime
      Then
         Ans<-1> = vListItem : "   (" : vDiff : " minutes.)"
      End
   End
While bMoreJobs
Repeat
Ans = Convert(@FM, ",", Ans)
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply