Hi,
A few months back we had many of our parallel jobs just hanging on the
OSH script (...) step. We opened a PMR with IBM and they suggested to install Fix Pack 3 We were running version 8.0.1 Fix Pack 1. We installed Fix Pack3 and we are now receiving the following error(for the past month)
main_program: Fatal Error: Unable to start ORCHESTRATE job:
APT_PMwaitForPlayersToStart failed while waiting for players to confirm
startup. This likely indicates a network problem.
Status from APT_PMpoll is 0; node name is node0
Once one parallel job gets this error no other parallel job would run unless we bounce the server(I have a dummy job that tests this with a rowgen going to a peek). We are currently bouncing the server 3 to 4 times a day to allow our ETL processes to run in production.
This has no affect on our server jobs, and a new PMR has been opened with IBM. We are getting nowhere with IBM, only the suggestion that upgrading to 8.5 may resolve the issue. They had us turn the McAfee Virus scan off on certain directories thingking that may be the culprit but that did not help.
I have read the other posts for this error and did not find much.
Any suggestions would be appreciated.
Thanks - -John
Unable to start ORCHESTRATE job
Moderators: chulett, rschirm, roy
-
- Premium Member
- Posts: 306
- Joined: Wed Jun 21, 2006 11:41 am
John,
I might have IBM support focus on the dsdlockd process. This can cause hangs like what you are experiencing. This may not be your issue, but worth the time to investigate.
We had this very issue and was related to the ownership of the dsdlockd process. We are Unix, but maybe this is a Windows issue too.
Worth a shot.
I might have IBM support focus on the dsdlockd process. This can cause hangs like what you are experiencing. This may not be your issue, but worth the time to investigate.
We had this very issue and was related to the ownership of the dsdlockd process. We are Unix, but maybe this is a Windows issue too.
Worth a shot.
Mike Hester
mhester@petra-ps.com
mhester@petra-ps.com
-
- Premium Member
- Posts: 306
- Joined: Wed Jun 21, 2006 11:41 am
-
- Premium Member
- Posts: 306
- Joined: Wed Jun 21, 2006 11:41 am
IBM just got back with the following:
1) Patch being created to backport the environment variables - APT_PM_PLAYER_CONNECT_TIMEOUT, APT_PM_PLAYER_TIMEOUT - which will let the system keep running jobs even though it is overloaded
2) Get ETA for patch
3) Setup conference call with Prudential to explain what this patch will actually do and what suggestions we have going forward.
I will update when we have the patch available and installed.
Thanks - - John
1) Patch being created to backport the environment variables - APT_PM_PLAYER_CONNECT_TIMEOUT, APT_PM_PLAYER_TIMEOUT - which will let the system keep running jobs even though it is overloaded
2) Get ETA for patch
3) Setup conference call with Prudential to explain what this patch will actually do and what suggestions we have going forward.
I will update when we have the patch available and installed.
Thanks - - John
-
- Premium Member
- Posts: 306
- Joined: Wed Jun 21, 2006 11:41 am
-
- Premium Member
- Posts: 306
- Joined: Wed Jun 21, 2006 11:41 am