Page 1 of 1

Sequence Error

Posted: Mon Feb 27, 2006 12:28 am
by novice_pgr
Hi All

<Seq_Name>.JobControl
(@JAC_SEPCS_MDSNG_NAMPLT);controller problem; Error calling
DSRunJob(<jobname>,code=-14
(Timed out while waiting for an event)


Anyone faced this before ?
suggestion to resolve this ?

Posted: Mon Feb 27, 2006 1:25 am
by Nageshsunkoji
Hi,

I think one of your job was not in a runnable state, means one of your your job is not in Compile position,either it is in Abort position or Not compiled position. So , check the position of your jobs if its not in compiled position compile it.

Regards
Nagesh.

Posted: Mon Feb 27, 2006 8:03 am
by chulett
No, that's not the problem. If you search the forum for either the error in question or the phrase 'timed out while waiting for event' you'll see it's a resource issue. You are asking too much of your system, spooling up more jobs that it can handle.

There is a hard coded value deep in the engine of 60 or 90 seconds (I do believe) as the 'timeout' value. When it attempts to start a job and it takes longer than that before it comes back and acknowleges that it started, it throws that timeout error.

That being said, what about some specifics? Hardware? Job design? When you say 'fourth job' how many actual processes are we talking about, up to and including that point?

statistics info

Posted: Mon Feb 27, 2006 9:10 am
by novice_pgr
Craig

System = SunOS
Node = ssi2
Release = 5.8
KernelID = Generic_108528-18
Machine = sun4u
NumCPU = 2


Job design is to do an insert/update for the target table. But the sequence has 18 jobs within it to be invoked. So 18 jobs will be started parallely .

So, any fixes for this ? Some where in some config file ..any parameters to be changed :wink:

Posted: Mon Feb 27, 2006 9:21 am
by kumar_s
As an administrator setting, 'under Operator specific' an option called DSIPC_OPEN_TIMEOUT is available, which by default it might be 30. This can be increased to 300 - 600.
But, apparently you need to nail down the cause of the time out. If it is really the lack of resource, it may not be advisable to run all the jobs paralle. Split up your sequence or reschedule your jobs or try to invoke 'SLEEP nn' command in a Execute command activity to slow down the pace of calling all the job at a strech.

Posted: Mon Feb 27, 2006 9:28 am
by kwwilliams
18 jobs at one time is a bit much for a 2 CPU box. Each job is contending for time on the CPU. I would run your jobs in groups of 4-5. Keeping in mind that when you move this into a production environment there are going to be other jobs running other than the one that you have created here. So not only do you need to keep in mind the design of your job, but the design of jobs running in your current production environment.

Posted: Mon Feb 27, 2006 10:25 am
by ray.wurlod
Keep in mind that - potentially - every stage in a parallel job requires one process on each processing node in the configuration file, not to mention one section leader process per job per processing node and one conductor process per job on the conductor node. Multiply this by 18 and you have way too many processes for a two-CPU machine. Upgrade to at least a 32-CPU machine.

Posted: Mon Feb 27, 2006 10:26 am
by ray.wurlod
Keep in mind that - potentially - every stage in a parallel job requires one process on each processing node in the configuration file, not to mention one section leader process per job per processing node and one conductor process per job on the conductor node. Multiply this by 18 and you have way too many processes for a two-CPU machine. Upgrade to at least a 32-CPU machine. Or a cluster of 16 two-CPU machines.

Posted: Mon Feb 27, 2006 9:55 pm
by novice_pgr
i want to kill the process which are currently run by one user and start running them fresh.

if i do ps-ef | grep <user>
i get a process like that which i dont have permission to kill it.

Can u tell me wht this process is doin ?

/u01/appl/DataStage/DataStage/PXEngine/bin/osh -APT_PMsectionLeaderFlag ssi2 10

When i try running the jobs even when no other jobs are been run . i get this error. So want to make sure , for that userid there are no currently any active processes running.

Posted: Mon Feb 27, 2006 10:20 pm
by rasi
Post a separate thread on the forum. Do a search on how to kill process. It's been covered many time in the forum