Detecting job restart within a Job Sequence
Moderators: chulett, rschirm, roy
Detecting job restart within a Job Sequence
Hi all.
I am putting together a restartable job sequence and would like to put in a process at the start of the job to detect whether the job is restarting after an abort so I can write this information to an audit record from within the job. The DataStage Director is able to natively detect when a job sequence is restarting after an abort and looking in the Job Control tab for a restartable Job Sequence the following line of code also seems to do restart detection as well.
"If DSCheckPointExists(DSJ.ME, cp$dtm) Then GoTo L$RESTART"
I have tried building a simple routine called from a Routine Activity to run this Function but it returns False when I pass the parent JS jobhandle during a abort restart. What I am wondering is, am I able to integrogate the return from the DSCheckPointExists Function some other way from within the job or is there some other simpler method of detecting when a Job Sequence is being restarted (from within the job)?
I have searched through the Archives looking under Checkpoints as this seems to be the best key word to use for Restartable Job Sequences but have not been able to find a reference to someone trying to do something similar. I have also looked through the doco but have not found any functionality which will enable me to do this.
I thought this would be something relatively simple to do and I am hoping I have not missed something obvious that I should know about. If I have missed something obvious I apologise in advance for wasting peoples time but would certainly appreciate being pointed in the right direction if this is the case. Otherwise if anyone has a solution or suggestion I would appreciate any assistance you can give me.
Thanks in advance.
Gavin
I am putting together a restartable job sequence and would like to put in a process at the start of the job to detect whether the job is restarting after an abort so I can write this information to an audit record from within the job. The DataStage Director is able to natively detect when a job sequence is restarting after an abort and looking in the Job Control tab for a restartable Job Sequence the following line of code also seems to do restart detection as well.
"If DSCheckPointExists(DSJ.ME, cp$dtm) Then GoTo L$RESTART"
I have tried building a simple routine called from a Routine Activity to run this Function but it returns False when I pass the parent JS jobhandle during a abort restart. What I am wondering is, am I able to integrogate the return from the DSCheckPointExists Function some other way from within the job or is there some other simpler method of detecting when a Job Sequence is being restarted (from within the job)?
I have searched through the Archives looking under Checkpoints as this seems to be the best key word to use for Restartable Job Sequences but have not been able to find a reference to someone trying to do something similar. I have also looked through the doco but have not found any functionality which will enable me to do this.
I thought this would be something relatively simple to do and I am hoping I have not missed something obvious that I should know about. If I have missed something obvious I apologise in advance for wasting peoples time but would certainly appreciate being pointed in the right direction if this is the case. Otherwise if anyone has a solution or suggestion I would appreciate any assistance you can give me.
Thanks in advance.
Gavin
Gavin Magill
ETL Developer
+6427 291 0525
ETL Developer
+6427 291 0525
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Can you show us how you got the "parent JS jobhandle"? I'm assuming you've applied DSAttachJob() to the name returned by DSJobController macro (or equivalent call to DSGetJobInfo().
The other way is to check for the existence of a CHECKPOINT record in the RT_STATUSnnn hashed file for the job. This will exist when in a restart situation and will not exist in a clean start situation. Of course, you need the job number from DS_JOBS for this approach.
The other way is to check for the existence of a CHECKPOINT record in the RT_STATUSnnn hashed file for the job. This will exist when in a restart situation and will not exist in a clean start situation. Of course, you need the job number from DS_JOBS for this approach.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Hi Ray / Craig
Thanks for your input. I have managed to cludge together a routine which checks the RT_STATUS hashfile to see if the Checkpoint row has been set and this is giving me the answer I need for the time being. Have to say my BASIC skills are pretty rusty so it took longer than it should have to get it up and running but it is working ok.
I went down the path of looking at whether I could findwhat sets the cp$dtm variable and like Craig I was not able to figure out where the EQUATE statement occurs hence my original posting above to ask for help.
Using the RT_STATUS file feels a bit clunky and I will need to make sure it continues to work in later upgrades but for the moment it is working for me so I just wanted say thankyou for everyones assistance.
Regards
Gavin
Thanks for your input. I have managed to cludge together a routine which checks the RT_STATUS hashfile to see if the Checkpoint row has been set and this is giving me the answer I need for the time being. Have to say my BASIC skills are pretty rusty so it took longer than it should have to get it up and running but it is working ok.
I went down the path of looking at whether I could findwhat sets the cp$dtm variable and like Craig I was not able to figure out where the EQUATE statement occurs hence my original posting above to ask for help.
Using the RT_STATUS file feels a bit clunky and I will need to make sure it continues to work in later upgrades but for the moment it is working for me so I just wanted say thankyou for everyones assistance.
Regards
Gavin
Gavin Magill
ETL Developer
+6427 291 0525
ETL Developer
+6427 291 0525
-
- Participant
- Posts: 33
- Joined: Mon Nov 12, 2007 1:02 am
- Location: Bangalore
Unix approach.
Since your server runs on Unix, may I suggest a UNIX approach? Something like a shell script that queries the dsjob -log and dsjob -logdetail and parses the previous run's log to determine the status of the previous run could satisfy your requirements.
I once did this (wrote a shell function to find out the aborted jobname from current sequnece run, I can PM the code if want to have a look), it was horribly gruesome and there were a LOT of issues, but i was able to manage the trick.
I once did this (wrote a shell function to find out the aborted jobname from current sequnece run, I can PM the code if want to have a look), it was horribly gruesome and there were a LOT of issues, but i was able to manage the trick.
Rishabh Sagar V
Bangalore
Bangalore
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Nah. Horribly gruesome, as you admit. Much easier to check for the existence of the CHECKPOINT record.
There is added complexity for multi-instance jobs, which each have a CHECKPOINT record. But viewing the RT_STATUSnnn hashed file will reveal the naming convention.
The basis of the technique is:
There is added complexity for multi-instance jobs, which each have a CHECKPOINT record. But viewing the RT_STATUSnnn hashed file will reveal the naming convention.
The basis of the technique is:
Code: Select all
JobName = DSGetJobInfo(DSJ.ME, DSJ.JOBNAME)
JobNumber = Trans("DS_JOBS", JobName, 5, "X")
StatusName = Convert(" ", "", "RT_STATUS" : JobNumber)
IsRestarting = (Trans(StatusName, JobName : ".CHECKPOINT", 0, "X") > "")
Last edited by ray.wurlod on Thu Jun 04, 2009 3:56 pm, edited 1 time in total.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
I am sitting at home at the moment so don't have access to the exact source for the routine but it is basically along the lines of the following. Note the parameters passed are Arg1=JobNo and Arg2=JobName.
The routine is called in a UserVariables Activity right at the start of the Job Sequence being run. As the Job Sequence is restartable the Job Control logic always creates a CHECKPOINT record in the RT_STATUSnnn hash file before it processes any stages but the content of the fifth field is empty until a first checkpoint is processed (this is the case when it is run from a non-restart state). If the job fails and is then restarted the content of the fifth field in the checkpoint record will be more than zero bytes (hence the test for the length greater than 0). Note I realise I need to put error handling around the Open and the Read but for testing purposes this did the job.
Also with regards dropping out to a unix script. I am trying very hard to avoid doing anything outside of the DS User Interface with this implementation. I am trying to keep the use of routines to an absolute minimum but for some things there isn't much option. The idea being that other developers on the project hopefully won't need to go poking round behind the scenes to figure out what is going on when maintaining any of the jobs. (This of course may be just wishful thinking but I am trying hard to make it so.)
Please feel free to comment on the above but be gentle, I am way outa practice.![Wink :wink:](./images/smilies/icon_wink.gif)
PS: Ray, I am sure your code is way more efficient than mine but I just need to work out exactly what it is doing first. (I haven't used the Trans command before. )![Shocked :shock:](./images/smilies/icon_eek.gif)
The routine is called in a UserVariables Activity right at the start of the Job Sequence being run. As the Job Sequence is restartable the Job Control logic always creates a CHECKPOINT record in the RT_STATUSnnn hash file before it processes any stages but the content of the fifth field is empty until a first checkpoint is processed (this is the case when it is run from a non-restart state). If the job fails and is then restarted the content of the fifth field in the checkpoint record will be more than zero bytes (hence the test for the length greater than 0). Note I realise I need to put error handling around the Open and the Read but for testing purposes this did the job.
Code: Select all
RoutineName = 'TestIsJobRestarting'
Ans = 'N'
V_RTHashFileName = 'RT_STATUS' : Arg1
K_HashFileKey = Arg2 : '.CHECKPOINT'
Open V_RTHashFileName To F_RTStatusHashFile Then
Read V.Record From F_RTStatusHashFile, K_HashFileKey Then
If LEN(V.Record<5>) > 0 Then Ans = 'Y'
End
Close F_RTStatusHashFile
End
Also with regards dropping out to a unix script. I am trying very hard to avoid doing anything outside of the DS User Interface with this implementation. I am trying to keep the use of routines to an absolute minimum but for some things there isn't much option. The idea being that other developers on the project hopefully won't need to go poking round behind the scenes to figure out what is going on when maintaining any of the jobs. (This of course may be just wishful thinking but I am trying hard to make it so.)
Please feel free to comment on the above but be gentle, I am way outa practice.
![Wink :wink:](./images/smilies/icon_wink.gif)
PS: Ray, I am sure your code is way more efficient than mine but I just need to work out exactly what it is doing first. (I haven't used the Trans command before. )
![Shocked :shock:](./images/smilies/icon_eek.gif)
Gavin Magill
ETL Developer
+6427 291 0525
ETL Developer
+6427 291 0525
While a bit late in on this one, I've been looking in to DSCheckPointExists and to an extent figured it out. The cp$dtm is no more than a timestamp field that the function passes back, I can only guess it is the time the checkpoint was created, all you need to do is declare an empty variable and then can interrogate afterwards.
Figured I'll post just in case someone else wonders about it
Figured I'll post just in case someone else wonders about it