Error when Multiple Instances running concurrently

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
jking123
Premium Member
Premium Member
Posts: 29
Joined: Tue Mar 23, 2004 9:18 pm

Error when Multiple Instances running concurrently

Post by jking123 »

We are getting the following error when running multiple instances of our job concurrently. We have even tried running only 2 in parallel. Have checked MFILES etc. It seems to be happening either while in a sort stage or a join stage.

node_node1: Player 4 terminated unexpectedly. [processmgr/player.C:138]
main_program: Unexpected termination by Unix signal 10(SIGBUS) [processmgr/slprocess.C:425]
SortByETLKey,0: Failure during execution of operator logic. [api/operator_rep.C:331]
APT_CombinedOperatorController(1),0: Fatal Error: waitForWriteSignal(): Premature EOF on node v08k40 No such file or directory [iomgr/iocomm.C:1632]
node_node1: Player 6 terminated unexpectedly. [processmgr/player.C:138]
main_program: Unexpected exit status 1 [processmgr/slprocess.C:420]
main_program: Step execution finished with status = FAILED. [sc/sc_api.C:252]
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Hi,
Try checking if you have existing temp sort files left from a previously crashed job.
It might be htat only when you run those jobs in parallel the system tries to create a file that was previously created, perhaps with another user, and then you get this error.

IHTH,
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
jking123
Premium Member
Premium Member
Posts: 29
Joined: Tue Mar 23, 2004 9:18 pm

Clean directories

Post by jking123 »

Thanks roy,
Where should I look for temp sort files.
I am cleaning all the files our job is using and creating between running the tests for runing 1 at a time and multiple concurrently. The jobs run cleanly if running one at a time.
Should I be looking for temp files which DSEE is creating. If so then where?
I also saw a release note in 7.5.2 which says they fixed something in PX with respect to running multiple instances. Don't know if we are running into this. We are running 7.5.1.

Multiple instance PX job fail with hang (ecase 52860)
---------------------------------------
Fixed to DSD_OshRun.B to correct the handling of the test for the
parallel job script file. This is now done in the correct location and
using a test which does not require write access to the script!
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Hi,
Those files reside in the temp directory you configured in your setup (like TMP/TMPDIR)
I'm not sure how and if the fix you mentioned is relevant in this case :roll:
IHTH,
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
Post Reply