Detecting job restart within a Job Sequence

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply
GavMagill
Participant
Posts: 14
Joined: Sun Mar 28, 2004 2:43 pm
Location: Auckland, New Zealand

Detecting job restart within a Job Sequence

Post by GavMagill »

Hi all.

I am putting together a restartable job sequence and would like to put in a process at the start of the job to detect whether the job is restarting after an abort so I can write this information to an audit record from within the job. The DataStage Director is able to natively detect when a job sequence is restarting after an abort and looking in the Job Control tab for a restartable Job Sequence the following line of code also seems to do restart detection as well.

"If DSCheckPointExists(DSJ.ME, cp$dtm) Then GoTo L$RESTART"

I have tried building a simple routine called from a Routine Activity to run this Function but it returns False when I pass the parent JS jobhandle during a abort restart. What I am wondering is, am I able to integrogate the return from the DSCheckPointExists Function some other way from within the job or is there some other simpler method of detecting when a Job Sequence is being restarted (from within the job)?

I have searched through the Archives looking under Checkpoints as this seems to be the best key word to use for Restartable Job Sequences but have not been able to find a reference to someone trying to do something similar. I have also looked through the doco but have not found any functionality which will enable me to do this.

I thought this would be something relatively simple to do and I am hoping I have not missed something obvious that I should know about. If I have missed something obvious I apologise in advance for wasting peoples time but would certainly appreciate being pointed in the right direction if this is the case. Otherwise if anyone has a solution or suggestion I would appreciate any assistance you can give me.

Thanks in advance.
Gavin
Gavin Magill
ETL Developer
+6427 291 0525
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Can you show us how you got the "parent JS jobhandle"? I'm assuming you've applied DSAttachJob() to the name returned by DSJobController macro (or equivalent call to DSGetJobInfo().

The other way is to check for the existence of a CHECKPOINT record in the RT_STATUSnnn hashed file for the job. This will exist when in a restart situation and will not exist in a clean start situation. Of course, you need the job number from DS_JOBS for this approach.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I've looked at the same code and have very little idea what cp$dtm would be. That's the problem with going 'under the covers' and using undocumented internal functions not meant for us mere mortals. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

cp$dtm is just a variable name. You can find its assignment statement earlier in the job control code.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I know... and no, I couldn't - hence my comment. It was referenced in the DSCheckPointExists() and DSLogInfo() calls but nowhere else. :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Check the included header files? EQUATE declaration? (I don't have any restartable sequences at the moment, so can't check.)
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

It does a single EQU and an INCLUDE on DSJ_XFUNCS.H but there's not much to it. I haven't done an exhaustive search yet but haven't had any luck where I have looked. :(
-craig

"You can never have too many knives" -- Logan Nine Fingers
GavMagill
Participant
Posts: 14
Joined: Sun Mar 28, 2004 2:43 pm
Location: Auckland, New Zealand

Post by GavMagill »

Hi Ray / Craig

Thanks for your input. I have managed to cludge together a routine which checks the RT_STATUS hashfile to see if the Checkpoint row has been set and this is giving me the answer I need for the time being. Have to say my BASIC skills are pretty rusty so it took longer than it should have to get it up and running but it is working ok.

I went down the path of looking at whether I could findwhat sets the cp$dtm variable and like Craig I was not able to figure out where the EQUATE statement occurs hence my original posting above to ask for help.

Using the RT_STATUS file feels a bit clunky and I will need to make sure it continues to work in later upgrades but for the moment it is working for me so I just wanted say thankyou for everyones assistance.

Regards
Gavin
Gavin Magill
ETL Developer
+6427 291 0525
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I'd wager that STATUS hahsed file is indeed the same thing the routine checks against. And if you'd like to post your routine code, we'd be glad to rip it to^H^H^H err, have a look at it and give you some pointers. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
vrishabhsagar
Participant
Posts: 33
Joined: Mon Nov 12, 2007 1:02 am
Location: Bangalore

Unix approach.

Post by vrishabhsagar »

Since your server runs on Unix, may I suggest a UNIX approach? Something like a shell script that queries the dsjob -log and dsjob -logdetail and parses the previous run's log to determine the status of the previous run could satisfy your requirements.

I once did this (wrote a shell function to find out the aborted jobname from current sequnece run, I can PM the code if want to have a look), it was horribly gruesome and there were a LOT of issues, but i was able to manage the trick.
Rishabh Sagar V
Bangalore
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Nah. Horribly gruesome, as you admit. Much easier to check for the existence of the CHECKPOINT record.

There is added complexity for multi-instance jobs, which each have a CHECKPOINT record. But viewing the RT_STATUSnnn hashed file will reveal the naming convention.

The basis of the technique is:

Code: Select all

JobName = DSGetJobInfo(DSJ.ME, DSJ.JOBNAME)
JobNumber = Trans("DS_JOBS", JobName, 5, "X")
StatusName = Convert(" ", "", "RT_STATUS" : JobNumber)
IsRestarting = (Trans(StatusName, JobName : ".CHECKPOINT", 0, "X") > "")
Last edited by ray.wurlod on Thu Jun 04, 2009 3:56 pm, edited 1 time in total.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
GavMagill
Participant
Posts: 14
Joined: Sun Mar 28, 2004 2:43 pm
Location: Auckland, New Zealand

Post by GavMagill »

I am sitting at home at the moment so don't have access to the exact source for the routine but it is basically along the lines of the following. Note the parameters passed are Arg1=JobNo and Arg2=JobName.

The routine is called in a UserVariables Activity right at the start of the Job Sequence being run. As the Job Sequence is restartable the Job Control logic always creates a CHECKPOINT record in the RT_STATUSnnn hash file before it processes any stages but the content of the fifth field is empty until a first checkpoint is processed (this is the case when it is run from a non-restart state). If the job fails and is then restarted the content of the fifth field in the checkpoint record will be more than zero bytes (hence the test for the length greater than 0). Note I realise I need to put error handling around the Open and the Read but for testing purposes this did the job.

Code: Select all

 RoutineName = 'TestIsJobRestarting'
      Ans = 'N'
      V_RTHashFileName = 'RT_STATUS' : Arg1
      K_HashFileKey = Arg2 : '.CHECKPOINT'
      Open V_RTHashFileName To F_RTStatusHashFile Then
         Read V.Record From F_RTStatusHashFile, K_HashFileKey Then
            If LEN(V.Record<5>) > 0 Then Ans = 'Y'
         End 
         Close F_RTStatusHashFile
      End 

Also with regards dropping out to a unix script. I am trying very hard to avoid doing anything outside of the DS User Interface with this implementation. I am trying to keep the use of routines to an absolute minimum but for some things there isn't much option. The idea being that other developers on the project hopefully won't need to go poking round behind the scenes to figure out what is going on when maintaining any of the jobs. (This of course may be just wishful thinking but I am trying hard to make it so.)

Please feel free to comment on the above but be gentle, I am way outa practice. :wink:

PS: Ray, I am sure your code is way more efficient than mine but I just need to work out exactly what it is doing first. (I haven't used the Trans command before. ) :shock:
Gavin Magill
ETL Developer
+6427 291 0525
Kryt0n
Participant
Posts: 584
Joined: Wed Jun 22, 2005 7:28 pm

Post by Kryt0n »

While a bit late in on this one, I've been looking in to DSCheckPointExists and to an extent figured it out. The cp$dtm is no more than a timestamp field that the function passes back, I can only guess it is the time the checkpoint was created, all you need to do is declare an empty variable and then can interrogate afterwards.

Figured I'll post just in case someone else wonders about it
Post Reply