Page 1 of 1

Scheduled jobs intermittently don't start

Posted: Thu Feb 10, 2005 9:53 am
by tonystark622
I have noticed that occasionally I have a job sequencer that doesn't run when it's scheduled. It runs ok before and after the event where it didn't run. If I look in the Job Log when this happens I see the entry for DS.SCHED where the scheduler fired off, but no entries at all for the job running. If I look in Director, the last run date/time are the last time the job ran successfully. It appears to me that the job just isn't starting at all during this scheduled event. The last time that this happened, I had at least two jobs where this happened, they were both supposed to start at 5:00am, but the job itself didn't run.

Does anyone have any ideas why this might be happening? Or, any ideas on how I can determine why the jobs aren't starting? I did look in the cron log and can see that the process to run the job is called. I also looked in a UNIX system log (I can't remember now what it was called) and didn't see any problems during that time...

Thanks for your help,
Tony

Posted: Thu Feb 10, 2005 11:52 am
by ketfos
Hi,
Are you scheduling using Datastage Director or through a cronjob on Unix?

Ketfos

Posted: Thu Feb 10, 2005 1:26 pm
by tonystark622
Through DataStage Director.

Tony

Posted: Thu Feb 10, 2005 1:38 pm
by chulett
Which creates the cron entry for you. :wink:

Tony and I have already 'talked' about this after he found my Oliver posting where we'd suffered from the same problem. In my case, it went as suddenly as it came and Ascential never could come up with an explanation as to why it was happening. :?

From what I remember, the symptoms were rather odd. The log would have only one new entry in it, the "Starting job xxxx" record. The odd thing was the Status view would still show the information from the previous run, as if it hadn't even tried to start.

Is that what you are seeing, Tony?

Posted: Thu Feb 10, 2005 1:52 pm
by ketfos
Tony,
You can call the job in the shell script and schedule it using the CRON instead of DataStage Director.

Ketfos

Posted: Thu Feb 10, 2005 2:41 pm
by Sainath.Srinivasan
I remember such a problem with DataStage 4.1 in multi-processor system. Can you please let me know whether you are running in multi-processor system?

Posted: Thu Feb 10, 2005 3:08 pm
by tonystark622
Craig,

Yes, this is exactly what I'm seeing.

Thanks, Ketfos, if it comes to that we will do that, but I would hate to have to go that route.

Sainath,
Yes, this is a multi-processor system. An HP UNIX system with 8 processors.

My problem, right now, is that I don't know what to look for to troubleshoot this issue any further. We did check the CRON logs and you can see where CRON launched the process that writes the Job Log entry, but the job itself never starts... We even looked at the /var/adm/syslog/syslog.log but there wasn't anything in there at all close to the time that this job was supposed to run.

Thanks everyone,
Tony

Posted: Thu Feb 10, 2005 3:22 pm
by Sainath.Srinivasan
To get more clarity, does the job just prior to the job that did not run does run successfully bit finishes in very short time - something like a few seconds?

Posted: Thu Feb 10, 2005 3:28 pm
by tonystark622
No. This is a job sequencer job that is scheduled to start at 5:00am. At 7:30am someone called to my attention that they hadn't receive an email report from this job as they usually did. I checked and what I saw in the Job Log for this Job Sequencer job was the DS.SCHED entry at 5:00am. There was literally nothing after that. I expected to see all the stuff from the job in there after the DS.SCHED entry.

Let me know if you have further questions and I'll try my best to clarify my situation.

Thanks for your help,
Tony

Posted: Thu Feb 10, 2005 3:29 pm
by chulett
No, not if it's like what I was seeing. Previous runs would be complete and error free and then suddenly... nothing. The next day it would run fine. :?

This was on an Alpha running Tru64, so it isn't something unique to HP/UX.

Posted: Thu Feb 10, 2005 3:35 pm
by Sainath.Srinivasan
The reason for such thing in version 4.1 was because in multi-processor systems, when a job is given a process id and followed by the next job immediately coming up and obtaining the same pid by mistake due to multiple processor, the engine gets confused with the completion of first and assumes it to be the completion of second job.

This leads to the start symbol and not any futher processing.

Posted: Thu Feb 10, 2005 3:40 pm
by ray.wurlod
I've noticed (7.1 on AIX) that the job entries are to be found in atjobs rather than in cronjobs. But this really addresses the "can't schedule" problem rather than the "won't start" problem.

Posted: Thu Feb 10, 2005 3:53 pm
by tonystark622
I've noticed (7.1 on AIX) that the job entries are to be found in atjobs rather than in cronjobs. But this really addresses the "can't schedule" problem rather than the "won't start" problem.
Ray, the only time I've dealt with 'at' was when I scheduled a non-recurring job. The scheduler uses 'at' for non-recurring jobs, rather than 'cron'.

I did see the entry in the cron log where it started the "dsr_sched.sh" process for that job. I also saw the DS.SCHED entry in the Job Log for that job, but nothing else.

Tony

Posted: Thu Feb 10, 2005 3:57 pm
by tonystark622
No, not if it's like what I was seeing. Previous runs would be complete and error free and then suddenly... nothing. The next day it would run fine.
Yes, Craig. This is exactly what I'm seeing.

Tony