Hanging Jobs and SIGPIPE in APT_JobmonFilter::writeMessage

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
adscot
Participant
Posts: 12
Joined: Mon Oct 13, 2003 3:04 am
Location: London

Hanging Jobs and SIGPIPE in APT_JobmonFilter::writeMessage

Post by adscot »

Hello,

I have a job that is hanging for around 40 minutes before crashing with the following message:

Code: Select all

main_program: SIGPIPE signal received in APT_JobmonFilter::writeMessage.
Caused by closing of socket on port 13,400.
Turning off job monitoring for the rest of the job.
"If a process tries to write to a pipe that has no reader, it will be sent the SIGPIPE signal from the kernel"

Interesting this problem only seems to happend once the file gets over a certain number of records (it works fine for 1000(12s),10,000(32s) records but hangs with 50,000). The job itself is using an number of buildops to perform lookups

I am thinking that somehow job monitoring is causing a problem. Should I turn off job monitoring for this job and if so how do I do this?

Or is there a better way to diagnose this problem?

Cheers,

Adrian
bigpoppa
Participant
Posts: 190
Joined: Fri Feb 28, 2003 11:39 am

Hanging Jobs and SIGPIPE in APT_JobmonFilter::writeMessage

Post by bigpoppa »

I think you're right about the jobmon.. It might be 'timing out' b/c nothing is being sent to it. I don't know exactly how to turn off the job mon, but if you root around the PX Engine bin and utilities directories on the server, you might find a script related to jobmon. Then, you can back it up and write a new job mon file that does nothing, and try executing your script again.

If anyone else has ideas on how to turn off the PX job mon, please share.

Anyhow, if it does turn out to be a job mon error, then please report it to ASCL, as it looks like a bug.

Thanks,
BP
Teej
Participant
Posts: 677
Joined: Fri Aug 08, 2003 9:26 am
Location: USA

Post by Teej »

Go to your `cat /.dshome`/../PXEngine/java/ and take a look at the JobMonApp.log file, and see if you can find anything recent. You may need to run the job and observe the difference in that file.

You can also turn on the debug mode for JobMonApp. Open the file jobmoninit, and edit the following line:

Code: Select all

nohup $APT_ORCHHOME/java/jre/bin/java JobMonApp $jobmon_port1 $jobmon_port2 > $APT_ORCHHOME/java/JobMonApp.log 2>&1 &
to be

Code: Select all

nohup $APT_ORCHHOME/java/jre/bin/java JobMonApp $jobmon_port1 $jobmon_port2 -debug > $APT_ORCHHOME/java/JobMonApp.log 2>&1 &
Have the Admin start and stop the monitor (you need root to do this), and see what the debug output says.

If nothing, call Ascential Support.

-T.J.
Developer of DataStage Parallel Engine (Orchestrate).
adscot
Participant
Posts: 12
Joined: Mon Oct 13, 2003 3:04 am
Location: London

Post by adscot »

Hello,

Thanks for your replies/help.

I looked in the log:
$DSHOME/../PXEngine/java/JobMonApp.log

...

Code: Select all

ResponseParseHandler created.

Fatal Parsing Error Occurred:The element type "responses" must be terminated by the matching end-tag "</responses>".
Received malformed xml syntax... Closing Connection...Stopping after fatal error: The element type "responses" must be terminated by the matching end-tag "</responses>".
org.xml.sax.SAXException: Stopping after fatal error: The element type "responses" must be terminated by the matching end-tag "</responses>".
        at org.apache.xerces.framework.XMLParser.reportError(XMLParser.java:1228)
        at org.apache.xerces.framework.XMLDocumentScanner.reportFatalXMLError(XMLDocumentScanner.java:579)
        at org.apache.xerces.framework.XMLDocumentScanner$ContentDispatcher.endOfInput(XMLDocumentScanner.java:1406)
        at org.apache.xerces.framework.XMLDocumentScanner.endOfInput(XMLDocumentScanner.java:418)
        at org.apache.xerces.validators.common.XMLValidator.sendEndOfInputNotifications(XMLValidator.java:694)
        at org.apache.xerces.readers.DefaultEntityHandler.changeReaders(DefaultEntityHandler.java:1026)
        at org.apache.xerces.readers.XMLEntityReader.changeReaders(XMLEntityReader.java:168)
        at org.apache.xerces.readers.StreamingCharReader.changeReaders(StreamingCharReader.java:126)
        at org.apache.xerces.readers.StreamingCharReader.scanContent(StreamingCharReader.java:909)
        at org.apache.xerces.framework.XMLDocumentScanner$ContentDispatcher.dispatch(XMLDocumentScanner.java:1094)
        at org.apache.xerces.framework.XMLDocumentScanner.parseSome(XMLDocumentScanner.java:381)
        at org.apache.xerces.framework.XMLParser.parse(XMLParser.java:1081)
        at ProducerInputReaderThread.run(ProducerInputReaderThread.java:117)
RequestParseHandler created.
Time to contact support.

Although I have just have another thought and will check my buildops are not outputing anything just incase this gets sent to the job monitor (is this possible...)

Cheers,

Adrian
Teej
Participant
Posts: 677
Joined: Fri Aug 08, 2003 9:26 am
Location: USA

Post by Teej »

adscot wrote:Time to contact support.
Ouch. I am not sure who in Support is capable of figuring out that mess. :) Let me know.
Although I have just have another thought and will check my buildops are not outputing anything just incase this gets sent to the job monitor (is this possible...)
It is possible to output to the logs. Just do a 'cout << "blah";' or 'cerr << "bleh";'.

There is also:

Code: Select all

*errorLog() << "I had lunch.";
errorLog().logError(3);
If you want to abort.

-T.J.
Developer of DataStage Parallel Engine (Orchestrate).
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The support group has access to Engineering, which is forbidden to us mere mortals. :wink:
That is, even they (the support analysts) can escalate problems. Kudos to them, they seem to solve most of our problems without needing to.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply