I found in the logs last lines reading two different things (depends which day):
I think it's when it dies. How do I find what's killing it?"Couldn't read from Socket: Connection reset"
and
"Could not write to subscriber socket."
Moderators: chulett, rschirm, roy
I think it's when it dies. How do I find what's killing it?"Couldn't read from Socket: Connection reset"
and
"Could not write to subscriber socket."
So I installed lsof because one can supply it with a port number as input parameter to see what's using the port. Then I wrote a script to capture this into a log file every second. It is pasted below in case someone thinks he could use it. I was hoping to spot the killer application by examining the file when jobmonitor dies. I was prying for it more than two weeks.Check there is no other application running on the ports taken by the JobMonApp
Additional lines were present when jobs were running (various phantoms, osh, sqlldr etc...), but nothing suspicious. At the time jobmonitor died this log also stopped receiving entries.from lsof____ *:13400 29022 (LISTEN) from ps____ dsadm 29022 1 0 Jun 12 ? 8:09 /asc/Ascential/DataStage/DSEngine/java/jre/bin/PA_RISC2.0/java time____ 20070613 13:51:02
from lsof____ *:13401 29022 (LISTEN) from ps____ dsadm 29022 1 0 Jun 12 ? 8:09 /asc/Ascential/DataStage/DSEngine/java/jre/bin/PA_RISC2.0/java time____ 20070613 13:51:03
#! /bin/sh
dafault_ifs=$IFS
IFS="
"
while [ 0 ]
do
sleep 1
IFS="
"
for I in $(/home/dsadm/lsof-4.77/lsof -i :13400 | awk '{print $9, $2, $10}' | grep -v 'NAME PID')
do
IFS=$dafault_ifs
brojac=`expr 1`
for J in $I
do
if [ $brojac = 2 ]
then
echo 'from lsof____' $I ' from ps____' `ps -f -p $J | grep -v 'UID PID PPID C STIME TTY TIME COMMAND'` ' time____' `date +'%Y%m%d %H:%M:%S'`
fi
brojac=`expr $brojac + 1`
done
done
IFS="
"
for I in $(/home/dsadm/lsof-4.77/lsof -i :13401 | awk '{print $9, $2, $10}' | grep -v 'NAME PID')
do
IFS=$dafault_ifs
brojac=`expr 1`
for J in $I
do
if [ $brojac = 2 ]
then
echo 'from lsof____' $I ' from ps____' `ps -f -p $J | grep -v 'UID PID PPID C STIME TTY TIME COMMAND'` ' time____' `date +'%Y%m%d %H:%M:%S'`
fi
brojac=`expr $brojac + 1`
done
done
done