Page 1 of 1

Job Mon Port

Posted: Sat May 07, 2005 1:50 am
by grb_garre
Hi


Is there any clues to Job Monitor Port Failures ???
Can anybody share the ideas ...


Thanks in advance
Raj

Posted: Sat May 07, 2005 5:38 am
by ray.wurlod
Not with so little information.

Can you please post actual error/warning messages? Some idea of your setup would be useful too - for example what might be trying to monitor jobs? MetaStage? Or are you working in a cluster, such that exchanges between player processes on different nodes and between player, section leader and conductor processes must occur using TCP?

Posted: Sat May 07, 2005 6:45 am
by grb_garre
ray.wurlod wrote:Not with so little information.

Can you please post actual error/warning messages? Some idea of your setup would be useful too - for example what might be trying to monitor jobs? MetaStage? Or are you working in a cluster, such that exchanges between player processes on different nodes and between player, section leader and conductor processes must occur using TCP?
Ray,

Initailly we had run the job in 4 node configuartion(clustered) and than
came back to to 2 node configuration(SMP)
Initial Run its giving fatal errors

1)Error when checking operator: temp.dst has 4 partitions, but only 2 are
accessible from the nodes in the configuration file.
2)Error when checking operator: The dataset will not be deleted.
3)Could not check all operators because of previous error(s)
4)temp.dst could not be deleted

And the next run ,
If i delete the dataset and run the job and it was through , but its giving a warning on job mon port

Failed to connect to JobMonApp on port 13401

Thanks

Posted: Sun May 08, 2005 7:22 am
by Amos.Rosmarin
Hi,

I have a very very long correspondence with Ascential support regarding this error.
I use Solaris and already got 2 patches that made things better but still not perfect.

The monitor is located in $APT_ORCHHOME/java
and there are logs you can see there.

The ports that the monitor is using are in
$APT_ORCHHOME/etc/jobmon_ports

The defaults are 13400 and 13401
f you know any other application that uses those ports it is best to change it to aviod collisions.

If you get the msg:
Failed to connect to JobMonApp on port 13401

it is best to start the service:

Code: Select all

$APT_ORCHHOME/java/jobmoninit start

HTH,
Amos

Posted: Sun May 08, 2005 3:06 pm
by ray.wurlod
Thank you for detailed solution. So it appears that the job monitor had not been started? (Should've thought of that - first question from support analyst - is it switched on?)

Posted: Mon May 09, 2005 8:25 pm
by T42
There is a known issue with JobMonApp related to a job crashing in an unusual way -- for some reason, it would take JobMonApp down with them.

There are a number of patches available, but again, as Amos mentioned, it does not appears to resolve the problem.

Go to $APT_ORCHHOME/java and take a look at the last few lines of the latest log. I bet you it'll be similiar to the messages I have been getting on my AIX box.

I am still trying to nail this bug, but lately, JobMonApp have been behaving -- and it correspond to developers actually improving on their job designs and not getting hard crashes (instead of normal aborts).