Hello Everyone,
I am trying to run one Job Sequence which calls many 10 Instances of a Job.But at the same time,another job sequences is running which is running around 30 instances of 3 Jobs(10 each).
As soon as i run my Job Sequences,it aborts and the error message is :
GL_Journal_Line.NJ1_03.JobControl (@GetLedgerAccount): Controller problem: Error calling DSRunJob(GL_Journal_Line_GetLedgerAccount.NJ1_3), code=-14
[Timed out while waiting for an event]
Before posting this query,I searched the fourm for 'code=-14' and found many postings.They all were talking about server overload.
But my question is,in my case,as i can see CPU Usgae is 32% and PF Usage = 4.30GB(15GB in total)
So we have resources available,I donot know then why the next job sequence is picking up the Processors/Memory.
Thanks and Regards
Timed out while waiting for an event
Moderators: chulett, rschirm, roy
Timed out while waiting for an event
Arun Verma
Your server is overloaded - period. It was unable to start your job within the timeout period (90 seconds?) and thus the error. There's more to it than just CPU usage when the word 'overload' is used.
In your case it sounds like a matter of the number of concurrently running jobs - or the number you are attempting to start at the same time. Or both. Throttle back. Or slow down the number of simultaneuosly starting jobs.
In your case it sounds like a matter of the number of concurrently running jobs - or the number you are attempting to start at the same time. Or both. Throttle back. Or slow down the number of simultaneuosly starting jobs.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
In your sequence, put in a call to "SLEEP RND(5)+5" to give you a minimum 5 second delay with a random additional delay. This might be enough to keep your server from overloading while starting up all those processes.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
Short answer - no. There's no magic config change that will solve your problem. You need to address the issue of number of simultaneous running/starting jobs as noted.theverma wrote:Can we increase the number of jobs that can be run simultaneously.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
We have been struggling with this dreadful error for two years now. It happens randomly an occasionally in almost any sequence. No matter how many jobs starting simultaneously. And we are pretty sure our DataStage machine doesn't hit its limits when this happens. The fact that most of the time everything runs OK makes it hard for us to accept the suggestion to decrease the number of jobs running simultaneously. How does one decrease the number if the number is one?
Only yesterday I have found this topic:
viewtopic.php?t=101191&highlight=ecase+70788
We are going to try this and (hopefully) live happily ever after.
Good luck theverma!
Only yesterday I have found this topic:
viewtopic.php?t=101191&highlight=ecase+70788
We are going to try this and (hopefully) live happily ever after.
Good luck theverma!
Possible solution for those who have powerful servers
Dear All,
We had the same problem of timeout error while starting many simultaneous jobs. Server load was just about 30% of maximum.
The solution in our case was to increase GLTABSZ and RLTABSZ. Also to be able to do it we had to increase Disk shared memory (using DMEMOFF, PMEMOFF, CMEMOFF, NMEMOFF)
After changing uvconfig do not forget to run
uvregen
and check
./bin/uv "CONFIG ALL"
As stated by IBM GLTABSZ should be = RLTABSZ
Hope this helps
We had the same problem of timeout error while starting many simultaneous jobs. Server load was just about 30% of maximum.
The solution in our case was to increase GLTABSZ and RLTABSZ. Also to be able to do it we had to increase Disk shared memory (using DMEMOFF, PMEMOFF, CMEMOFF, NMEMOFF)
After changing uvconfig do not forget to run
uvregen
and check
./bin/uv "CONFIG ALL"
As stated by IBM GLTABSZ should be = RLTABSZ
Hope this helps
Alexander