Jobs getting aborted when running on 2 or more nodes

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Jobs getting aborted when running on 2 or more nodes

Post by priyadarshikunal »

Hi All,

When i am trying to run my job on more than 1 node, it aborts with few fatal errors like

Code: Select all

RemDup,0: Failure during execution of operator logic. 

RemDup,2: Fatal Error: waitForWriteSignal(): Premature EOF on node iehibu12 Socket operation on non-socket

main_program: APT_PMsectionLeader(1, node1), player 4 - Unexpected exit status 1.

node_node4: Player 4 terminated unexpectedly.

Sort,3: Could not send close message (shared memory)

or 
Sort,3: sendWriteSignal() failed on node iehibu12 ds = 4 conspart = 3 Broken pipe

I don't know why it fails every time i try to run on more than 1 node, however this job runs fine on one node

I tried searching the forum but haven't got any resolution on this.

1 Node Configuration File

Code: Select all

{
	node "node1"
	{
		fastname "iehibu12"
		pools ""
		resource disk "/u03/Datasets" {pools ""}
		resource scratchdisk "/u03/Datasets" {pools ""}
	}
}
2 Node Configuration File

Code: Select all

{
	node "node1"
	{
		fastname "iehibu12"
		pools ""
                resource disk "/u03/Datasets" {pools ""}
                resource scratchdisk "/u04/Scratch" {pools ""}
	}
	node "node2"
	{
		fastname "iehibu12"
		pools ""
                resource disk "/u03/Datasets" {pools ""}
                resource scratchdisk "/u04/Scratch" {pools ""}
	}
}
I am really unable to find the problem. Please suggest

Regards,
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

That machine name looks very familiar!

What stages are you using in that job?
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

ArndW wrote:That machine name looks very familiar!

What stages are you using in that job? ...
3 row generators -> funnel -> sort -> remove duplicate ->switch ->2 datasets

a simple job to test

and I think you have worked on this system for your previous assignment
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

Just giving the thread a little bump :)
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

If you replace the switch stage and everything after that with a simple peek, do you still get the same error?
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

ArndW wrote:If you replace the switch stage and everything after that with a simple peek, do you still get the same error?
yes the same error

but when i replace the remove duplicate stage and every thing after that with a dataset or peek it runs fine on 4 nodes.
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
Post Reply