We are getting the following error when running multiple instances of our job concurrently. We have even tried running only 2 in parallel. Have checked MFILES etc. It seems to be happening either while in a sort stage or a join stage.
node_node1: Player 4 terminated unexpectedly. [processmgr/player.C:138]
main_program: Unexpected termination by Unix signal 10(SIGBUS) [processmgr/slprocess.C:425]
SortByETLKey,0: Failure during execution of operator logic. [api/operator_rep.C:331]
APT_CombinedOperatorController(1),0: Fatal Error: waitForWriteSignal(): Premature EOF on node v08k40 No such file or directory [iomgr/iocomm.C:1632]
node_node1: Player 6 terminated unexpectedly. [processmgr/player.C:138]
main_program: Unexpected exit status 1 [processmgr/slprocess.C:420]
main_program: Step execution finished with status = FAILED. [sc/sc_api.C:252]
Error when Multiple Instances running concurrently
Moderators: chulett, rschirm, roy
Hi,
Try checking if you have existing temp sort files left from a previously crashed job.
It might be htat only when you run those jobs in parallel the system tries to create a file that was previously created, perhaps with another user, and then you get this error.
IHTH,
Try checking if you have existing temp sort files left from a previously crashed job.
It might be htat only when you run those jobs in parallel the system tries to create a file that was previously created, perhaps with another user, and then you get this error.
IHTH,
Roy R.
Time is money but when you don't have money time is all you can afford.
Search before posting:)
Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
![Image](http://www.worldcommunitygrid.org/images/logo.gif)
Time is money but when you don't have money time is all you can afford.
Search before posting:)
Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
![Image](http://www.worldcommunitygrid.org/images/logo.gif)
Clean directories
Thanks roy,
Where should I look for temp sort files.
I am cleaning all the files our job is using and creating between running the tests for runing 1 at a time and multiple concurrently. The jobs run cleanly if running one at a time.
Should I be looking for temp files which DSEE is creating. If so then where?
I also saw a release note in 7.5.2 which says they fixed something in PX with respect to running multiple instances. Don't know if we are running into this. We are running 7.5.1.
Multiple instance PX job fail with hang (ecase 52860)
---------------------------------------
Fixed to DSD_OshRun.B to correct the handling of the test for the
parallel job script file. This is now done in the correct location and
using a test which does not require write access to the script!
Where should I look for temp sort files.
I am cleaning all the files our job is using and creating between running the tests for runing 1 at a time and multiple concurrently. The jobs run cleanly if running one at a time.
Should I be looking for temp files which DSEE is creating. If so then where?
I also saw a release note in 7.5.2 which says they fixed something in PX with respect to running multiple instances. Don't know if we are running into this. We are running 7.5.1.
Multiple instance PX job fail with hang (ecase 52860)
---------------------------------------
Fixed to DSD_OshRun.B to correct the handling of the test for the
parallel job script file. This is now done in the correct location and
using a test which does not require write access to the script!
Hi,
Those files reside in the temp directory you configured in your setup (like TMP/TMPDIR)
I'm not sure how and if the fix you mentioned is relevant in this case
IHTH,
Those files reside in the temp directory you configured in your setup (like TMP/TMPDIR)
I'm not sure how and if the fix you mentioned is relevant in this case
![Rolling Eyes :roll:](./images/smilies/icon_rolleyes.gif)
IHTH,
Roy R.
Time is money but when you don't have money time is all you can afford.
Search before posting:)
Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
![Image](http://www.worldcommunitygrid.org/images/logo.gif)
Time is money but when you don't have money time is all you can afford.
Search before posting:)
Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
![Image](http://www.worldcommunitygrid.org/images/logo.gif)