Page 1 of 1

Error in while running configuration file

Posted: Fri Jul 31, 2009 8:48 am
by thepakks
Hi,

I am trying to run Configuration File and get following Error.

It runs 5 processes on 4 nodes.
Host key verification failed.
Host key verification failed.
Host key verification failed.
##W IIS-DSEE-TFPM-00152 20:17:56(000) <main_program> Accept timed out retries = 16
##E IIS-DSEE-TFPM-00153 20:17:56(001) <main_program> The section leader on CNDABMCSDBDT03 died
##E IIS-DSEE-TFPM-00356 20:17:56(002) <main_program>


and Following is My Configuration File -

{
node "node1"
{
fastname "CNDAMBCSDBDT04"
pools ""
resource disk "/datafs6/Datasets" {pools ""}
resource scratchdisk "/datafs6/Scratch" {pools ""}
}
node "db2node1"
{
fastname "CNDABMCSDBDT03"
pools "db2"
resource disk "/datafs6/temp" {pools ""}
resource scratchdisk "/datafs6/temp" {pools ""}
}
node "db2node2"
{
fastname "CNDABMCSDBDT03"
pools "db2"
resource disk "/datafs6/temp" {pools ""}
resource scratchdisk "/datafs6/temp" {pools ""}
}
node "db2node3"
{
fastname "CNDABMCSDBDT03"
pools "db2"
resource disk "/datafs6/temp" {pools ""}
resource scratchdisk "/datafs6/temp" {pools ""}
}


}



Please let me know solution for the same.

Thanks in Advance

Posted: Fri Jul 31, 2009 8:56 am
by miwinter
See:

"Host key verification failed.
Host key verification failed.
Host key verification failed."

It's an issue of permissions across from the node you are running from, in connecting to the node defined which it is showing in error.

Posted: Fri Jul 31, 2009 9:35 am
by thepakks
Please let me step How to resolve this issue...
I have checked node file and got permisstions are correct.

Posted: Fri Jul 31, 2009 10:15 am
by ArndW
Not permission on the node file, but permissions to the nodes referenced by the file (the fastnames).

Posted: Sun Aug 02, 2009 10:38 pm
by thepakks
I have checked permission of node file. all that things are ok.

Posted: Sun Aug 02, 2009 11:56 pm
by ray.wurlod
Have you checked that the keys for rsh have properly been emplaced?

Posted: Mon Aug 03, 2009 8:45 am
by miwinter
That's not what Arnd, Ray and I have been pointing out. It's permissions between the nodes, which means it is an issue of "one server (not) being able to access another", in plain English. It seems to me you would be best speaking to your unix administrator.

Posted: Mon Aug 03, 2009 5:09 pm
by ray.wurlod
Read the man page for rsh and learn about the different ways that the remote host will allow access. One of these authentication things must be done in order for rsh to work (and therefore for multi-machine parallel jobs, which rely upon rsh, to work).