Sequence thinks job is not runnable

ldesilvio · Post by **ldesilvio** » Thu Nov 03, 2005 8:07 am

Hi,

We are having intermittent wierd happenings with job sequences. We're running DS 7.5 on Win Serv 2000. Occasionally, a sequence will abort because it can't attach a job. But in the log for that job, it says the previous run was successful. Here's some additional info:

* The sequence is part of a group of sequences that are looped by a job control. The looping is done because the job stream is looking for files to arrive. This whole process is done daily and runs for 17 hours

* This problem occurs maybe twice a week.

* The problem happens to different jobs in different sequences. Seems to be random

Here's the log message from the latest sequence that failed:

ServLinkMasterControl3..JobControl (@ServLinkComExtFieldsHash): Controller problem: Error calling DSAttachJob(ServLinkComExtFieldsHash)

(DSGetJobInfo) Failed to open RT_LOG9 file.

(DSOpenJob) Cannot open job ServLinkComExtFieldsHash. - not a runnable job

I verified that the job ServLinkComExtFieldsHash was in a runnable state when the sequence tried to call it. I also checked to see if RT_LOG9 exists and there is a record for ServLinkComExtFieldsHash in DS_JOBS. Both are there. Anyone experience this before?

ArndW · Post by **ArndW** » Thu Nov 03, 2005 8:11 am

Yes, I have seen this type of problem before - it happens when calling a multi-instance job with the same instanceid that is running at that time. The first time I saw this it was in a similarly confusing scenario as yours - sometimes it would work and sometimes it wouldn't and it was difficult to find out the sporadicity.

If this is a multiinstance job then I would start looking there.

ldesilvio · Post by **ldesilvio** » Thu Nov 03, 2005 8:27 am

Arnd,

Thanks for the quick response. I double checked the jobs and none are multi instanced. And, that job only get's called by that one sequence, so there's no issue of two sequences calling the same job. Another thing I did was check disk space on the server and there's over 100 GB available, so that's not it. Anything else I could check? I'm running out of ideas

ArndW · Post by **ArndW** » Thu Nov 03, 2005 9:03 am

Since the log file is present, the failure to open the log file should only come if the OPEN statement is executed without the LOCKED option (something Ray always puts in and I never do, I just let it fail in my programs). So if such a statement without a LOCKED clause is used in the DSAttachJob() routine and someone has an exclusive file lock on the log file at that time you might get this error (that's a lot of subsequent maybe conditions, but could explain your sporadicity).

The only think I can think of offhand that might take a filelock on the log file is a manual or an automated purge. I don't think your log files are broken since you said that this happens to different jobs. How busy (slow) is your system when this happens. If it is very busy then chances of this happening are higher than on lightly loaded machines.

Do you do a job reset as part of your sequences? This would do something different depending upon whether the previous run finished without errors/warnings; can you see some rule on this?

ldesilvio · Post by **ldesilvio** » Thu Nov 03, 2005 9:17 am

Arnd,

I'm not sure what you are referring to in the first paragraph of your latest reply. The job is being called by a sequence, not a job control. I'm not coding BASIC to perform an OPEN, but your comments about the LOCKED clause are good to know for future reference.

We are doing an automated purge on each run of the sequence, keeping the last 20 runs. While in the loop, the sequence gets called every few minutes, thus performing an automated purge every few minutes. Should we be doing that so often?

All of the jobs are set to "Reset if required, then run"

Thanks

ArndW · Post by **ArndW** » Thu Nov 03, 2005 10:10 am

The first paragraph was too obscure, sorry. The DSAttachJob() routine is written in the BASIC language, as are many DS internal functions and routines. I think that when the job log is cleared a filelock is taken, but my system here is too fast to let me confirm that.

Can you change your autopurge settings to remove log entries older than a day or two instead of removing entries at every run? I am grasping at possibilities here but it won't cost you too much work to try this out.

Can you discern a connection between the jobs that abort and their previous run completion status (i.e. finished or finished with warnings or aborting). The reset will take some file locks and also perform an implicit close and re-attach {normally this is transparent}, but won't do this if the previous run was without warnings or errors - so if you see a pattern that these sporadic aborts only occur when the previous run had problems you could narrow down the possible cause.

ldesilvio · Post by **ldesilvio** » Thu Nov 03, 2005 10:29 am

I'll try changing the autopurge settings.

As far as the job statuses, the last job that aborted had a status of finished befor the sequence tried to run it. The sequence thought that the job was not runnable for whatever reason. Let me try to rerun with the autopurge settings changed. Since this occurs sporadically, it may be a few days before I get back

Thanks