Slowness is kicking off jobs

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
EJRoufs
Participant
Posts: 73
Joined: Tue Aug 19, 2003 2:12 pm
Location: USA

Slowness is kicking off jobs

Post by EJRoufs »

We have a problem with our jobs just sitting there and waiting often, and I am wondering what the problem is. Sometimes, we spend more time waiting for the jobs to "kick off" then we do for them to run.... rather ridiculous when we're running jobs where time is critical.

Here is an example of one I am watching right now:

RunCreateGEACCMRCurr..JobControl (DSRunJob): Waiting for job HyperionGEACYTD to start

HyperionGEACYTD is kicked off in a Sequencer job. It was sent the message to start at 11:53. 10 minutes later, HyperionGEACYTD is still "waiting" to start running. This seems to happen fairly often. It WILL kick off eventually, I'm not worried about that. I am wondering what the delay is, though, and what we could do to remedy it.

Thanks! :)
Eric
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

This is usually a symptom of machine overload. What else is happening on the machine? For example, is it a domain controller as well? Those kinds of activities run at a higher priority than all other processes.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

How do you determine if the job is "running"? What is the status of the job in Director? Are there stages that say "Starting" in the Monitor that contain SQL? Are there before-job routines that are executing? What's in the logs of the jobs? Have you check the DS server to see what processes are currently running? Have you used Performance Monitor in Windoze to see the CPU and disk resources on the server?
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
EJRoufs
Participant
Posts: 73
Joined: Tue Aug 19, 2003 2:12 pm
Location: USA

Post by EJRoufs »

kcbland wrote:How do you determine if the job is "running"? What is the status of the job in Director? Are there stages that say "Starting" in the Monitor that contain SQL? Are there before-job routines that are executing? What's in the logs of the jobs? Have you check the DS server to see what processes are currently running? Have you used Performance Monitor in Windoze to see the CPU and disk resources on the server?
I look at the Sequencer to see that it has basically sent the message to the Job to run. Then, I open the Job, turn Diagnostics on, and wait for it to actually start running. It took 12 minutes this time.

No SQL in the job. One small before-job routine that reads in a couple of parameters.

The job log for the Sequencer will say that it is waiting for the Job to start at 11:53 in this example. The log for the Job won't say anything until it actually starts.... 12:05 in this example.

Most of us developers here don't have the kind of access to the DataStage server that would enable us to do monitoring on the server of that type. I can do some, though. We have 8 CPUs on that box. They seem to be running just fine, and not hardly strained. Also plenty of memory available. Other than "System" things going on, there appears to be 5 of us logged onto the box, with probably only me running jobs at the moment.
Eric
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

If the job itself didn't actual start in 60/90 seconds, it would throw a timeout error of -14. It *is* actually starting and the 12 minute "delay" you are seeing before the monitor kicks in is the amount of time it is taking your before job routine to run.

Simple and small as it may be, it is the culprit.
-craig

"You can never have too many knives" -- Logan Nine Fingers
EJRoufs
Participant
Posts: 73
Joined: Tue Aug 19, 2003 2:12 pm
Location: USA

Post by EJRoufs »

chulett wrote:If the job itself didn't actual start in 60/90 seconds, it would throw a timeout error of -14. It *is* actually starting and the 12 minute "delay" you are seeing before the monitor kicks in is the amount of time it is taking your before job routine to run.

Simple and small as it may be, it is the culprit.

I'm thinking that's not it, because the before-routine we have is fairly new, while the problem is not as new. It's worth a shot, though. I'll simply remove the routine from a few of the jobs, and see if they continue to do the same thing.
Eric
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

If your before-routine is using "openseq" commands on a common sequential file, they will LOCK each other out. Consider passive reads such as calling DSExecute to "type" the file into screen output return variables. I'd bet a box of donuts your jobs are locking each other out.

OPENSEQ will obtain an internal file lock on sequential files for the duration of processing. A RELEASE statement can free up the file, but the first suggestion I gave is much better as it doesn't require any locking.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Nothing else it could be, Eric. And Mr Box-o-Doughnuts has nailed it, I would think. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
EJRoufs
Participant
Posts: 73
Joined: Tue Aug 19, 2003 2:12 pm
Location: USA

Post by EJRoufs »

chulett wrote:Nothing else it could be, Eric. And Mr Box-o-Doughnuts has nailed it, I would think. :wink:
Sounds good. I will definitely give that a shot, then. :) Any ideas why it only is a problem occasionally, and not all the time, if I'm running the exact same job every time? Or why it might "wait" for 10 minutes, when it only takes a matter of seconds to read in the parameters?

Thanks for all your help, guys! :)
Eric
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

If multiple jobs read the same parameter file using the same routine, only the one that actually owns the lock is reading it at any time. The others are queueing waiting for the lock to be released. There may be some sleep time involved in this, depending on how your particular operating system implements waits on semaphores.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply