Resetting a crashed [status 96] job

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

It's actually RT_STATUSnn (not DS_, which are design-time information, and with a job number suffix). You need to find the job number from DS_JOBS then clear the data from the RT_STATUSnn hashed file.
This is possible with commands.

Code: Select all

SELECT JOBNO FROM DS_JOBS WHERE NAME = 'JobName';
CLEAR.FILE RT_STATUSnn
It is also possible via BASIC, effectively doing the same things. Error handling has been omitted for clarity; you will include it.

Code: Select all

Open "DS_JOBS" To fDS_JOBS 
Then
   ReadV JobNo From fDS_JOBS, "JobName", 5
   Then
      StatusFileName = Convert(" ", "", "RT_STATUS" : JobNo)
      Open StatusFileName To fStatusFile
      Then
         ClearFile fStatusFile
         Close fStatusFile
         Call DSLogInfo(StatusFileName : " has been cleared.", "MyRoutine")
      End
      Close fDS_JOBS
   End
End
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ldesilvio
Premium Member
Premium Member
Posts: 32
Joined: Thu Mar 27, 2003 6:38 pm
Location: Sewell, NJ

Post by ldesilvio »

Sorry, I meant to say RT_STATUS, RT_CONFIG, and RT_LOG. Yes, I did clear those three files and still shows "Running" as the status.
ldesilvio
Premium Member
Premium Member
Posts: 32
Joined: Thu Mar 27, 2003 6:38 pm
Location: Sewell, NJ

Post by ldesilvio »

After trying some different scenarios, I think I have a solution which is not great, but works. I think the problem I was having in my BASIC code is that I was performing a DSAttachJob on the crashed job. This was causing a fatal error because it was not runnable. My code was this:

Code: Select all

JOB.NAME = 'sjob001'
hJob1 = DSAttachJob(JOB.NAME, DSJ.ERRFATAL)
If NOT(hJob1) Then
   Call DSLogFatal("Job Attach Failed: ":JOB.NAME, "JobControl")
   Abort
End
ErrCode = DSGetJobInfo(hJob1, DSJ.JOBSTATUS)
IF ErrCode # DSJS.RUNNING THEN
   * Setup Batch::UTILCompileJobs, run it, wait for it to finish, and test for success
      hJob2 = DSAttachJob("Batch::UTILCompileJobs", DSJ.ERRFATAL)
      If NOT(hJob2) Then
        Call DSLogFatal("Job Attach Failed:  Batch::UTILCompileJobs", "JobControl")
        Abort
      End
      ErrCode = DSSetParam(hJob2, "AreYouSure", "Y")
      ErrCode = DSSetParam(hJob2, "Folder", "")
      ErrCode = DSSetParam(hJob2, "JobsLike", JOB.NAME)
      ErrCode = DSSetParam(hJob2, "NotCompiledOnly", "N")
      ErrCode = DSSetParam(hJob2, "ClearLog", "N")
      ErrCode = DSSetDisableProjectHandler(hJob2, @FALSE)
      ErrCode = DSSetDisableJobHandler(hJob2, @FALSE)
      ErrCode = DSRunJob(hJob2, DSJ.RUNNORMAL)
      ErrCode = DSWaitForJob(hJob2)
      Status = DSGetJobInfo(hJob2, DSJ.JOBSTATUS)
      If Status = DSJS.RUNFAILED Or Status = DSJS.CRASHED Then
         * Fatal Error - No Return
         Call DSLogFatal("Job Failed: Batch::UTILCompileJobs", "JobControl")
      End
END
I changed it to call Batch::UTILCompileJobs unconditionally like this.

Code: Select all

     JOB.NAME = 'sjob001'
     * Setup Batch::UTILCompileJobs, run it, wait for it to finish, and test for success
      hJob2 = DSAttachJob("Batch::UTILCompileJobs", DSJ.ERRFATAL)
      If NOT(hJob2) Then
        Call DSLogFatal("Job Attach Failed:  Batch::UTILCompileJobs", "JobControl")
        Abort
      End
      ErrCode = DSSetParam(hJob2, "AreYouSure", "Y")
      ErrCode = DSSetParam(hJob2, "Folder", "")
      ErrCode = DSSetParam(hJob2, "JobsLike", JOB.NAME)
      ErrCode = DSSetParam(hJob2, "NotCompiledOnly", "N")
      ErrCode = DSSetParam(hJob2, "ClearLog", "N")
      ErrCode = DSSetDisableProjectHandler(hJob2, @FALSE)
      ErrCode = DSSetDisableJobHandler(hJob2, @FALSE)
      ErrCode = DSRunJob(hJob2, DSJ.RUNNORMAL)
      ErrCode = DSWaitForJob(hJob2)
      Status = DSGetJobInfo(hJob2, DSJ.JOBSTATUS)
      If Status = DSJS.RUNFAILED Or Status = DSJS.CRASHED Then
         * Fatal Error - No Return
         Call DSLogFatal("Job Failed: Batch::UTILCompileJobs", "JobControl")
      End
If sjob001 is running when Batch::UTILCompileJobs is called, no error will occur, so this can be run anytime. Not a great solution because this needs to be in a loop that tries to compile the job at some time interval.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

It is impossible for a cleared status file to report "Running". Did you refresh the view? Did you clear the right RT_STATUSnn file? (Eek!)

You could, with somewhat more finesse, identify the status field for the job plus each stage and resource in the RT_STATUSnn file and reset them individually if you are very certain that there are no processes associated with the job. This is doable with DataStage BASIC (after all "they" do it with DataStage BASIC), but I'm not in a position right now to spend an hour or so researching it all. Sorry.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ldesilvio
Premium Member
Premium Member
Posts: 32
Joined: Thu Mar 27, 2003 6:38 pm
Location: Sewell, NJ

Post by ldesilvio »

I may not have refreshed the view in Director. I'll have to play some more and post the results here later.
Post Reply