Page 1 of 1

Detection of node failure too long

Posted: Thu Jan 18, 2007 4:21 pm
by UPS
We are testing node failure scenarios in our datastage cluster and find that the conductor node can take a very long time to detect the failure of a compute node that is executing a job. As long as 25 minutes in one case. The job just hangs for a very long time before it aborts. Is there a setting that can be used to control this amount of time and make the conductor detect that a section leader is no longer there?