Page 1 of 1

Failed to connect to JobMonApp on port 13401

Posted: Thu Mar 24, 2016 10:37 am
by MrBlack
Occasionally I'm seeing this error being thrown:

Code: Select all

Failed to connect to JobMonApp on port 13401

main_program: Received SIGPIPE signal caused by closing of the socket on port 13,400.
No output will be sent to port 13,400 for the rest of the job.
We're only seeing this occur in one job/project. It just happens to be our largest. The job runs every night and spawns hundreds of other jobs in the project. We've seen this happen 3-4 times in the past 3 months. Upon receiving the error, we can immediately restart the job and everything will run fine.

Can anyone offer any suggestions things to look at to help identify what might be happening? I have my server and network guys looking at things on their end to see if it's a hardware bug. I'm wondering if there could be something on the application side that would either provide more insight or if there's something that could be changed like a timeout variable or something.

Posted: Fri Mar 25, 2016 6:32 pm
by JRodriguez
MrBlack,
Set the environment variable APT_NO_JOBMON=True on the job and that might prevent the warning
Regards