When i am trying to run my job on more than 1 node, it aborts with few fatal errors like
Code: Select all
RemDup,0: Failure during execution of operator logic.
RemDup,2: Fatal Error: waitForWriteSignal(): Premature EOF on node iehibu12 Socket operation on non-socket
main_program: APT_PMsectionLeader(1, node1), player 4 - Unexpected exit status 1.
node_node4: Player 4 terminated unexpectedly.
Sort,3: Could not send close message (shared memory)
or
Sort,3: sendWriteSignal() failed on node iehibu12 ds = 4 conspart = 3 Broken pipe
I don't know why it fails every time i try to run on more than 1 node, however this job runs fine on one node
I tried searching the forum but haven't got any resolution on this.
1 Node Configuration File
Code: Select all
{
node "node1"
{
fastname "iehibu12"
pools ""
resource disk "/u03/Datasets" {pools ""}
resource scratchdisk "/u03/Datasets" {pools ""}
}
}
Code: Select all
{
node "node1"
{
fastname "iehibu12"
pools ""
resource disk "/u03/Datasets" {pools ""}
resource scratchdisk "/u04/Scratch" {pools ""}
}
node "node2"
{
fastname "iehibu12"
pools ""
resource disk "/u03/Datasets" {pools ""}
resource scratchdisk "/u04/Scratch" {pools ""}
}
}
Regards,