Page 1 of 1

Hanging Jobs and SIGPIPE in APT_JobmonFilter::writeMessage

Posted: Thu Dec 04, 2003 12:05 pm
by adscot
Hello,

I have a job that is hanging for around 40 minutes before crashing with the following message:

Code: Select all

main_program: SIGPIPE signal received in APT_JobmonFilter::writeMessage.
Caused by closing of socket on port 13,400.
Turning off job monitoring for the rest of the job.
"If a process tries to write to a pipe that has no reader, it will be sent the SIGPIPE signal from the kernel"

Interesting this problem only seems to happend once the file gets over a certain number of records (it works fine for 1000(12s),10,000(32s) records but hangs with 50,000). The job itself is using an number of buildops to perform lookups

I am thinking that somehow job monitoring is causing a problem. Should I turn off job monitoring for this job and if so how do I do this?

Or is there a better way to diagnose this problem?

Cheers,

Adrian

Hanging Jobs and SIGPIPE in APT_JobmonFilter::writeMessage

Posted: Thu Dec 04, 2003 2:38 pm
by bigpoppa
I think you're right about the jobmon.. It might be 'timing out' b/c nothing is being sent to it. I don't know exactly how to turn off the job mon, but if you root around the PX Engine bin and utilities directories on the server, you might find a script related to jobmon. Then, you can back it up and write a new job mon file that does nothing, and try executing your script again.

If anyone else has ideas on how to turn off the PX job mon, please share.

Anyhow, if it does turn out to be a job mon error, then please report it to ASCL, as it looks like a bug.

Thanks,
BP

Posted: Thu Dec 04, 2003 4:22 pm
by Teej
Go to your `cat /.dshome`/../PXEngine/java/ and take a look at the JobMonApp.log file, and see if you can find anything recent. You may need to run the job and observe the difference in that file.

You can also turn on the debug mode for JobMonApp. Open the file jobmoninit, and edit the following line:

Code: Select all

nohup $APT_ORCHHOME/java/jre/bin/java JobMonApp $jobmon_port1 $jobmon_port2 > $APT_ORCHHOME/java/JobMonApp.log 2>&1 &
to be

Code: Select all

nohup $APT_ORCHHOME/java/jre/bin/java JobMonApp $jobmon_port1 $jobmon_port2 -debug > $APT_ORCHHOME/java/JobMonApp.log 2>&1 &
Have the Admin start and stop the monitor (you need root to do this), and see what the debug output says.

If nothing, call Ascential Support.

-T.J.

Posted: Fri Dec 05, 2003 6:18 am
by adscot
Hello,

Thanks for your replies/help.

I looked in the log:
$DSHOME/../PXEngine/java/JobMonApp.log

...

Code: Select all

ResponseParseHandler created.

Fatal Parsing Error Occurred:The element type "responses" must be terminated by the matching end-tag "</responses>".
Received malformed xml syntax... Closing Connection...Stopping after fatal error: The element type "responses" must be terminated by the matching end-tag "</responses>".
org.xml.sax.SAXException: Stopping after fatal error: The element type "responses" must be terminated by the matching end-tag "</responses>".
        at org.apache.xerces.framework.XMLParser.reportError(XMLParser.java:1228)
        at org.apache.xerces.framework.XMLDocumentScanner.reportFatalXMLError(XMLDocumentScanner.java:579)
        at org.apache.xerces.framework.XMLDocumentScanner$ContentDispatcher.endOfInput(XMLDocumentScanner.java:1406)
        at org.apache.xerces.framework.XMLDocumentScanner.endOfInput(XMLDocumentScanner.java:418)
        at org.apache.xerces.validators.common.XMLValidator.sendEndOfInputNotifications(XMLValidator.java:694)
        at org.apache.xerces.readers.DefaultEntityHandler.changeReaders(DefaultEntityHandler.java:1026)
        at org.apache.xerces.readers.XMLEntityReader.changeReaders(XMLEntityReader.java:168)
        at org.apache.xerces.readers.StreamingCharReader.changeReaders(StreamingCharReader.java:126)
        at org.apache.xerces.readers.StreamingCharReader.scanContent(StreamingCharReader.java:909)
        at org.apache.xerces.framework.XMLDocumentScanner$ContentDispatcher.dispatch(XMLDocumentScanner.java:1094)
        at org.apache.xerces.framework.XMLDocumentScanner.parseSome(XMLDocumentScanner.java:381)
        at org.apache.xerces.framework.XMLParser.parse(XMLParser.java:1081)
        at ProducerInputReaderThread.run(ProducerInputReaderThread.java:117)
RequestParseHandler created.
Time to contact support.

Although I have just have another thought and will check my buildops are not outputing anything just incase this gets sent to the job monitor (is this possible...)

Cheers,

Adrian

Posted: Fri Dec 05, 2003 12:13 pm
by Teej
adscot wrote:Time to contact support.
Ouch. I am not sure who in Support is capable of figuring out that mess. :) Let me know.
Although I have just have another thought and will check my buildops are not outputing anything just incase this gets sent to the job monitor (is this possible...)
It is possible to output to the logs. Just do a 'cout << "blah";' or 'cerr << "bleh";'.

There is also:

Code: Select all

*errorLog() << "I had lunch.";
errorLog().logError(3);
If you want to abort.

-T.J.

Posted: Fri Dec 05, 2003 6:56 pm
by ray.wurlod
The support group has access to Engineering, which is forbidden to us mere mortals. :wink:
That is, even they (the support analysts) can escalate problems. Kudos to them, they seem to solve most of our problems without needing to.