Sequence Error

novice_pgr · Post by **novice_pgr** » Mon Feb 27, 2006 12:28 am

Hi All

<Seq_Name>.JobControl
(@JAC_SEPCS_MDSNG_NAMPLT);controller problem; Error calling
DSRunJob(<jobname>,code=-14
(Timed out while waiting for an event)

Anyone faced this before ?
suggestion to resolve this ?

Nageshsunkoji · Post by **Nageshsunkoji** » Mon Feb 27, 2006 1:25 am

Hi,

I think one of your job was not in a runnable state, means one of your your job is not in Compile position,either it is in Abort position or Not compiled position. So , check the position of your jobs if its not in compiled position compile it.

Regards
Nagesh.

chulett · Post by **chulett** » Mon Feb 27, 2006 8:03 am

No, that's not the problem. If you search the forum for either the error in question or the phrase 'timed out while waiting for event' you'll see it's a resource issue. You are asking too much of your system, spooling up more jobs that it can handle.

There is a hard coded value deep in the engine of 60 or 90 seconds (I do believe) as the 'timeout' value. When it attempts to start a job and it takes longer than that before it comes back and acknowleges that it started, it throws that timeout error.

That being said, what about some specifics? Hardware? Job design? When you say 'fourth job' how many actual processes are we talking about, up to and including that point?

novice_pgr · Post by **novice_pgr** » Mon Feb 27, 2006 9:10 am

Craig

System = SunOS
Node = ssi2
Release = 5.8
KernelID = Generic_108528-18
Machine = sun4u
NumCPU = 2

Job design is to do an insert/update for the target table. But the sequence has 18 jobs within it to be invoked. So 18 jobs will be started parallely .

So, any fixes for this ? Some where in some config file ..any parameters to be changed

kumar_s · Post by **kumar_s** » Mon Feb 27, 2006 9:21 am

As an administrator setting, 'under Operator specific' an option called DSIPC_OPEN_TIMEOUT is available, which by default it might be 30. This can be increased to 300 - 600.
But, apparently you need to nail down the cause of the time out. If it is really the lack of resource, it may not be advisable to run all the jobs paralle. Split up your sequence or reschedule your jobs or try to invoke 'SLEEP nn' command in a Execute command activity to slow down the pace of calling all the job at a strech.

kwwilliams · Post by **kwwilliams** » Mon Feb 27, 2006 9:28 am

18 jobs at one time is a bit much for a 2 CPU box. Each job is contending for time on the CPU. I would run your jobs in groups of 4-5. Keeping in mind that when you move this into a production environment there are going to be other jobs running other than the one that you have created here. So not only do you need to keep in mind the design of your job, but the design of jobs running in your current production environment.

ray.wurlod · Post by **ray.wurlod** » Mon Feb 27, 2006 10:25 am

Keep in mind that - potentially - every stage in a parallel job requires one process on each processing node in the configuration file, not to mention one section leader process per job per processing node and one conductor process per job on the conductor node. Multiply this by 18 and you have way too many processes for a two-CPU machine. Upgrade to at least a 32-CPU machine.

ray.wurlod · Post by **ray.wurlod** » Mon Feb 27, 2006 10:26 am

Keep in mind that - potentially - every stage in a parallel job requires one process on each processing node in the configuration file, not to mention one section leader process per job per processing node and one conductor process per job on the conductor node. Multiply this by 18 and you have way too many processes for a two-CPU machine. Upgrade to at least a 32-CPU machine. Or a cluster of 16 two-CPU machines.

novice_pgr · Post by **novice_pgr** » Mon Feb 27, 2006 9:55 pm

i want to kill the process which are currently run by one user and start running them fresh.

if i do ps-ef | grep <user>
i get a process like that which i dont have permission to kill it.

Can u tell me wht this process is doin ?

/u01/appl/DataStage/DataStage/PXEngine/bin/osh -APT_PMsectionLeaderFlag ssi2 10

When i try running the jobs even when no other jobs are been run . i get this error. So want to make sure , for that userid there are no currently any active processes running.

rasi · Post by **rasi** » Mon Feb 27, 2006 10:20 pm

Post a separate thread on the forum. Do a search on how to kill process. It's been covered many time in the forum

DSXchange

Sequence Error

Sequence Error

statistics info