Scheduled jobs intermittently don't start

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
tonystark622
Premium Member
Premium Member
Posts: 483
Joined: Thu Jun 12, 2003 4:47 pm
Location: St. Louis, Missouri USA

Scheduled jobs intermittently don't start

Post by tonystark622 »

I have noticed that occasionally I have a job sequencer that doesn't run when it's scheduled. It runs ok before and after the event where it didn't run. If I look in the Job Log when this happens I see the entry for DS.SCHED where the scheduler fired off, but no entries at all for the job running. If I look in Director, the last run date/time are the last time the job ran successfully. It appears to me that the job just isn't starting at all during this scheduled event. The last time that this happened, I had at least two jobs where this happened, they were both supposed to start at 5:00am, but the job itself didn't run.

Does anyone have any ideas why this might be happening? Or, any ideas on how I can determine why the jobs aren't starting? I did look in the cron log and can see that the process to run the job is called. I also looked in a UNIX system log (I can't remember now what it was called) and didn't see any problems during that time...

Thanks for your help,
Tony
ketfos
Participant
Posts: 562
Joined: Mon May 03, 2004 8:58 pm
Location: san francisco
Contact:

Post by ketfos »

Hi,
Are you scheduling using Datastage Director or through a cronjob on Unix?

Ketfos
tonystark622
Premium Member
Premium Member
Posts: 483
Joined: Thu Jun 12, 2003 4:47 pm
Location: St. Louis, Missouri USA

Post by tonystark622 »

Through DataStage Director.

Tony
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Which creates the cron entry for you. :wink:

Tony and I have already 'talked' about this after he found my Oliver posting where we'd suffered from the same problem. In my case, it went as suddenly as it came and Ascential never could come up with an explanation as to why it was happening. :?

From what I remember, the symptoms were rather odd. The log would have only one new entry in it, the "Starting job xxxx" record. The odd thing was the Status view would still show the information from the previous run, as if it hadn't even tried to start.

Is that what you are seeing, Tony?
-craig

"You can never have too many knives" -- Logan Nine Fingers
ketfos
Participant
Posts: 562
Joined: Mon May 03, 2004 8:58 pm
Location: san francisco
Contact:

Post by ketfos »

Tony,
You can call the job in the shell script and schedule it using the CRON instead of DataStage Director.

Ketfos
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

I remember such a problem with DataStage 4.1 in multi-processor system. Can you please let me know whether you are running in multi-processor system?
tonystark622
Premium Member
Premium Member
Posts: 483
Joined: Thu Jun 12, 2003 4:47 pm
Location: St. Louis, Missouri USA

Post by tonystark622 »

Craig,

Yes, this is exactly what I'm seeing.

Thanks, Ketfos, if it comes to that we will do that, but I would hate to have to go that route.

Sainath,
Yes, this is a multi-processor system. An HP UNIX system with 8 processors.

My problem, right now, is that I don't know what to look for to troubleshoot this issue any further. We did check the CRON logs and you can see where CRON launched the process that writes the Job Log entry, but the job itself never starts... We even looked at the /var/adm/syslog/syslog.log but there wasn't anything in there at all close to the time that this job was supposed to run.

Thanks everyone,
Tony
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

To get more clarity, does the job just prior to the job that did not run does run successfully bit finishes in very short time - something like a few seconds?
tonystark622
Premium Member
Premium Member
Posts: 483
Joined: Thu Jun 12, 2003 4:47 pm
Location: St. Louis, Missouri USA

Post by tonystark622 »

No. This is a job sequencer job that is scheduled to start at 5:00am. At 7:30am someone called to my attention that they hadn't receive an email report from this job as they usually did. I checked and what I saw in the Job Log for this Job Sequencer job was the DS.SCHED entry at 5:00am. There was literally nothing after that. I expected to see all the stuff from the job in there after the DS.SCHED entry.

Let me know if you have further questions and I'll try my best to clarify my situation.

Thanks for your help,
Tony
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

No, not if it's like what I was seeing. Previous runs would be complete and error free and then suddenly... nothing. The next day it would run fine. :?

This was on an Alpha running Tru64, so it isn't something unique to HP/UX.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

The reason for such thing in version 4.1 was because in multi-processor systems, when a job is given a process id and followed by the next job immediately coming up and obtaining the same pid by mistake due to multiple processor, the engine gets confused with the completion of first and assumes it to be the completion of second job.

This leads to the start symbol and not any futher processing.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I've noticed (7.1 on AIX) that the job entries are to be found in atjobs rather than in cronjobs. But this really addresses the "can't schedule" problem rather than the "won't start" problem.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
tonystark622
Premium Member
Premium Member
Posts: 483
Joined: Thu Jun 12, 2003 4:47 pm
Location: St. Louis, Missouri USA

Post by tonystark622 »

I've noticed (7.1 on AIX) that the job entries are to be found in atjobs rather than in cronjobs. But this really addresses the "can't schedule" problem rather than the "won't start" problem.
Ray, the only time I've dealt with 'at' was when I scheduled a non-recurring job. The scheduler uses 'at' for non-recurring jobs, rather than 'cron'.

I did see the entry in the cron log where it started the "dsr_sched.sh" process for that job. I also saw the DS.SCHED entry in the Job Log for that job, but nothing else.

Tony
tonystark622
Premium Member
Premium Member
Posts: 483
Joined: Thu Jun 12, 2003 4:47 pm
Location: St. Louis, Missouri USA

Post by tonystark622 »

No, not if it's like what I was seeing. Previous runs would be complete and error free and then suddenly... nothing. The next day it would run fine.
Yes, Craig. This is exactly what I'm seeing.

Tony
Post Reply