Time out warnings

GY768 · Post by **GY768** » Fri Oct 28, 2005 8:36 am

Can anyone help with the following:

I have been executing a run Seq. Jobs containing pivot tables within the seq are producing many files (over a million) and datastage produces the below error. I have investigated all possible areas for this and have failed to come up with a solution.

It seems the time out occurs because the jobs are in the aborted state, however the action within the seq is to reset and job if required.

Can anyone help?

run_DWH_ELEMENT..JobControl (@Seq_Xfm_Lnd_Pld_DWH_ELEMENT_QUOTE):
Controller problem: Error calling
DSRunJob(Seq_Xfm_Lnd_Pld_DWH_ELEMENT_QUOTE), code=-14 [Timed out while waiting for an event]

kcbland · Post by **kcbland** » Fri Oct 28, 2005 8:55 am

Your node that holds the DSEngine is probably slammed, your job control is timing out in its request to run a job.

Think of it this way, the API sends a message to the engine process to please start the job, the engine doesn't acknowledge the message, the API times out. The only reason the engine process didn't respond is because the node is overwhelmed.

Your message is from the DSRunJob API, the code is the timeout message. If you're on Solaris, run prstat from a telnet session and monitor the node load. If on AIX or HP/UX, get your hands on top or glance to see the same load measures. You'll need to talk to Ascential tech support about any patches that might mitigate this issue.

track_star · Post by **track_star** » Fri Oct 28, 2005 9:07 am

Are you running multiple instance jobs? Also, how many jobs are running concurrently when you get the error? There are a few settings in the uvconfig that might alleviate the issue.

GY768 · Post by **GY768** » Fri Oct 28, 2005 10:14 am

Yes, at any one time on average there are between 12 - 15 multiple instance jobs running in the seq.

kcbland · Post by **kcbland** » Fri Oct 28, 2005 10:22 am

The multiple instances jobs aren't the issue, the load on the node is the issue. I can have 100's of very resource-light jobs running simultaneously without issue, but 1 intensive job can use all resources.

You neeeed to monitor server load. Every single DS developer needs to get into the habit of having a top, prstat, glance, whatever tool running all of the time. It's even better to have a single tool gathering the information continuously 24x7x365 and make it available for everyone.

track_star · Post by **track_star** » Fri Oct 28, 2005 12:43 pm

GY768, can you post the values from the following entries in your uvconfig file:

RLTABSZ
GLTABSZ
MAXRLOCK
UVSYNC

The file is in DSEngine.

ray.wurlod · Post by **ray.wurlod** » Fri Oct 28, 2005 6:41 pm

Not sure where you're going with that, but without GSEMNUM they're not particularly useful. Surely if collisions on locks in the Repository were the problem, a report from SEMAPHORE.STATUS would be a useful thing? It might also be worth creating $DSHOME/errlog to capture any Engine errors.

Code: Select all

touch $DSHOME/errlog
chmod 777 $DSHOME/errlog

bkarth · Post by **bkarth** » Mon Nov 28, 2005 12:15 pm

Hello,

We are getting the same error on a Win 2003 Server.

RLTABSZ = 75
GLTABSZ = 75
MAXRLOCK = 74
UVSYNC - 0

Is there any patch for this? All these jobs are very very simple ones and it shouldn't take any resource. I am not sure what is causing this issue.

DS Version is 7.5x2 (Server Job Sequence)

Thanks,
Karthik

ray.wurlod · Post by **ray.wurlod** » Wed Nov 30, 2005 12:04 am

What else is happening on the server? I recall one site who ran DS on their Primary Domain Controller (!) - this error occurred many times!

DSXchange

Time out warnings

Time out warnings

Multiple Instance

We are getting the same problem on a Win 2003 Server