sendWriteSignal() failed on node LKF-PISRVENG01 ds = 7 consp
Posted: Wed Oct 29, 2014 10:49 am
Hi ,
We have a parallel job which fails with below error message , We do not know its failed because of bad design or some resource contention
Can anyone share some toughts please
It run daily But it fails only some times
Error Log:
Error Timestamp: 2014-10-24 10:08:33
Error Job Name: J_NS_CATAMARAN_HLTRANS834_EMP_LOAD_BLK
Error Job Path: \Jobs\PowerSTEPP\ELIGIBILITY\CATAMARAN_HLTRANS834_BLK\EXTRACT
Error Message: Unhandled abort encountered in job J_NS_CATAMARAN_HLTRANS834_EMP_LOAD_BLK
Job Log is as below:
7165\2014-10-24 10:08:15\1\\376\From previous run (...)
7166\2014-10-24 10:08:15\5\\377\Starting Job J_NS_CATAMARAN_HLTRANS834_EMP_LOAD_BLK. (...)
7167\2014-10-24 10:08:15\1\\377\Attached Message Handlers: (...)
7168\2014-10-24 10:08:16\1\\377\Environment variable settings: (...)
7169\2014-10-24 10:08:16\1\\377\Parallel job initiated
7170\2014-10-24 10:08:16\1\\377\OSH script (...)
7171\2014-10-24 10:08:18\1\\377\main_program: IBM WebSphere DataStage Enterprise Edition 8.5.0.6152 (...)
7172\2014-10-24 10:08:18\1\\377\main_program: conductor uname: -s=Windows_NT; -r=1; -v=6; -n=LKF-PISRVENG01; -m=Pentium
7173\2014-10-24 10:08:18\1\\377\main_program: orchgeneral: loaded (...)
7174\2014-10-24 10:08:20\1\\377\main_program: APT configuration file: D:/IBM/InformationServer/Server/Configurations/Node2.apt (...)
7175\2014-10-24 10:08:25\1\\377\main_program: This step has 23 datasets: (...)
7176\2014-10-24 10:08:25\3\\377\APT_CombinedOperatorController,1: Fatal Error: Caught unknown exception in player process: terminating.
7177\2014-10-24 10:08:25\3\\377
ode_node2: Player 7 terminated unexpectedly.
7178\2014-10-24 10:08:25\3\\377\main_program: APT_PMsectionLeader(2, node2), player 7 - Unexpected exit status 1.
7179\2014-10-24 10:08:25\3\\377
ode_node2: Player 4 terminated unexpectedly.
7180\2014-10-24 10:08:25\3\\377\main_program: APT_PMsectionLeader(2, node2), player 4 - Unexpected exit status 1.
7181\2014-10-24 10:08:25\3\\377\LKP_MEMBER,1: sendWriteSignal() failed on node LKF-PISRVENG01 ds = 7 conspart = 1 Broken pipe
7182\2014-10-24 10:08:25\3\\377\LKP_MEMBER,1: Write to dataset on [fd 16] failed (Error 0) on node node2, hostname LKF-PISRVENG01
7183\2014-10-24 10:08:25\3\\377\LKP_MEMBER,1: Block write failure. Partition: 1
7184\2014-10-24 10:08:25\3\\377\LKP_MEMBER,1: sendWriteSignal() failed on node LKF-PISRVENG01 ds = 7 conspart = 1 Broken pipe
7185\2014-10-24 10:08:25\3\\377\LKP_MEMBER,1: Write to dataset on [fd 16] failed (Error 0) on node node2, hostname LKF-PISRVENG01
7186\2014-10-24 10:08:25\3\\377\LKP_MEMBER,1: Block write failure. Partition: 1
7187\2014-10-24 10:08:25\3\\377\LKP_MEMBER,1: Internal Error: (shbuf): iomgr\iomgr.C: 1901
7188\2014-10-24 10:08:25\3\\377
ode_node2: Player 3 terminated unexpectedly.
7189\2014-10-24 10:08:25\3\\377\main_program: APT_PMsectionLeader(2, node2), player 3 - Unexpected exit status 1.
7190\2014-10-24 10:08:25\3\\377\LKP_MEMBER,0: sendWriteSignal() failed on node LKF-PISRVENG01 ds = 7 conspart = 1 Broken pipe
7191\2014-10-24 10:08:25\3\\377\LKP_MEMBER,0: Write to dataset on [fd 19] failed (Error 0) on node node1, hostname LKF-PISRVENG01
7192\2014-10-24 10:08:30\3\\377\LKP_MEMBER,0: Block write failure. Partition: 1
7193\2014-10-24 10:08:30\3\\377\main_program: Step execution finished with status = FAILED.
7194\2014-10-24 10:08:30\1\\377\main_program: Startup time, 0:12; production run time, 0:00.
7195\2014-10-24 10:08:30\1\\377\Contents of phantom output file (...)
7196\2014-10-24 10:08:31\5\\377\Job J_NS_CATAMARAN_HLTRANS834_EMP_LOAD_BLK aborted.
7197\2014-10-24 10:08:31\7\\377\(SEQ_J_NS_CATAMARAN_HLTRANS834_BLK) <- J_NS_CATAMARAN_HLTRANS834_EMP_LOAD_BLK: Job under control finished.
We have a parallel job which fails with below error message , We do not know its failed because of bad design or some resource contention
Can anyone share some toughts please
It run daily But it fails only some times
Error Log:
Error Timestamp: 2014-10-24 10:08:33
Error Job Name: J_NS_CATAMARAN_HLTRANS834_EMP_LOAD_BLK
Error Job Path: \Jobs\PowerSTEPP\ELIGIBILITY\CATAMARAN_HLTRANS834_BLK\EXTRACT
Error Message: Unhandled abort encountered in job J_NS_CATAMARAN_HLTRANS834_EMP_LOAD_BLK
Job Log is as below:
7165\2014-10-24 10:08:15\1\\376\From previous run (...)
7166\2014-10-24 10:08:15\5\\377\Starting Job J_NS_CATAMARAN_HLTRANS834_EMP_LOAD_BLK. (...)
7167\2014-10-24 10:08:15\1\\377\Attached Message Handlers: (...)
7168\2014-10-24 10:08:16\1\\377\Environment variable settings: (...)
7169\2014-10-24 10:08:16\1\\377\Parallel job initiated
7170\2014-10-24 10:08:16\1\\377\OSH script (...)
7171\2014-10-24 10:08:18\1\\377\main_program: IBM WebSphere DataStage Enterprise Edition 8.5.0.6152 (...)
7172\2014-10-24 10:08:18\1\\377\main_program: conductor uname: -s=Windows_NT; -r=1; -v=6; -n=LKF-PISRVENG01; -m=Pentium
7173\2014-10-24 10:08:18\1\\377\main_program: orchgeneral: loaded (...)
7174\2014-10-24 10:08:20\1\\377\main_program: APT configuration file: D:/IBM/InformationServer/Server/Configurations/Node2.apt (...)
7175\2014-10-24 10:08:25\1\\377\main_program: This step has 23 datasets: (...)
7176\2014-10-24 10:08:25\3\\377\APT_CombinedOperatorController,1: Fatal Error: Caught unknown exception in player process: terminating.
7177\2014-10-24 10:08:25\3\\377
ode_node2: Player 7 terminated unexpectedly.
7178\2014-10-24 10:08:25\3\\377\main_program: APT_PMsectionLeader(2, node2), player 7 - Unexpected exit status 1.
7179\2014-10-24 10:08:25\3\\377
ode_node2: Player 4 terminated unexpectedly.
7180\2014-10-24 10:08:25\3\\377\main_program: APT_PMsectionLeader(2, node2), player 4 - Unexpected exit status 1.
7181\2014-10-24 10:08:25\3\\377\LKP_MEMBER,1: sendWriteSignal() failed on node LKF-PISRVENG01 ds = 7 conspart = 1 Broken pipe
7182\2014-10-24 10:08:25\3\\377\LKP_MEMBER,1: Write to dataset on [fd 16] failed (Error 0) on node node2, hostname LKF-PISRVENG01
7183\2014-10-24 10:08:25\3\\377\LKP_MEMBER,1: Block write failure. Partition: 1
7184\2014-10-24 10:08:25\3\\377\LKP_MEMBER,1: sendWriteSignal() failed on node LKF-PISRVENG01 ds = 7 conspart = 1 Broken pipe
7185\2014-10-24 10:08:25\3\\377\LKP_MEMBER,1: Write to dataset on [fd 16] failed (Error 0) on node node2, hostname LKF-PISRVENG01
7186\2014-10-24 10:08:25\3\\377\LKP_MEMBER,1: Block write failure. Partition: 1
7187\2014-10-24 10:08:25\3\\377\LKP_MEMBER,1: Internal Error: (shbuf): iomgr\iomgr.C: 1901
7188\2014-10-24 10:08:25\3\\377
ode_node2: Player 3 terminated unexpectedly.
7189\2014-10-24 10:08:25\3\\377\main_program: APT_PMsectionLeader(2, node2), player 3 - Unexpected exit status 1.
7190\2014-10-24 10:08:25\3\\377\LKP_MEMBER,0: sendWriteSignal() failed on node LKF-PISRVENG01 ds = 7 conspart = 1 Broken pipe
7191\2014-10-24 10:08:25\3\\377\LKP_MEMBER,0: Write to dataset on [fd 19] failed (Error 0) on node node1, hostname LKF-PISRVENG01
7192\2014-10-24 10:08:30\3\\377\LKP_MEMBER,0: Block write failure. Partition: 1
7193\2014-10-24 10:08:30\3\\377\main_program: Step execution finished with status = FAILED.
7194\2014-10-24 10:08:30\1\\377\main_program: Startup time, 0:12; production run time, 0:00.
7195\2014-10-24 10:08:30\1\\377\Contents of phantom output file (...)
7196\2014-10-24 10:08:31\5\\377\Job J_NS_CATAMARAN_HLTRANS834_EMP_LOAD_BLK aborted.
7197\2014-10-24 10:08:31\7\\377\(SEQ_J_NS_CATAMARAN_HLTRANS834_BLK) <- J_NS_CATAMARAN_HLTRANS834_EMP_LOAD_BLK: Job under control finished.