Page 1 of 1

grid issue : main_program: Accept timed out retries = 16

Posted: Mon Jan 11, 2010 9:28 am
by pascalnicolasl
Hello All,

Since today we have this problem and so far all the jobs were running fine.
Now, we are facing an issue when running the job on grid environnment.
We have this error message :
main_program: Accept timed out retries = 16

The job job is running fine when grid is disabled.

Any hint on this will be helpful.

Posted: Tue Jan 12, 2010 8:34 pm
by keshav0307
in the grid queue , all the compute node is busy.. command qstat will show how many jobs already in queue... so delete the jobs from grid queue using qdel .......

by disabling the grid, you are by passing the queue, and overloading the compute node.

Posted: Thu Jan 14, 2010 10:49 am
by pascalnicolasl
Restart of the linux box resolved the problem.

Thanks