Unable to start ORCHESTRATE job

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
JPalatianos
Premium Member
Premium Member
Posts: 306
Joined: Wed Jun 21, 2006 11:41 am

Unable to start ORCHESTRATE job

Post by JPalatianos »

Hi,
A few months back we had many of our parallel jobs just hanging on the
OSH script (...) step. We opened a PMR with IBM and they suggested to install Fix Pack 3 We were running version 8.0.1 Fix Pack 1. We installed Fix Pack3 and we are now receiving the following error(for the past month)

main_program: Fatal Error: Unable to start ORCHESTRATE job:
APT_PMwaitForPlayersToStart failed while waiting for players to confirm
startup. This likely indicates a network problem.
Status from APT_PMpoll is 0; node name is node0

Once one parallel job gets this error no other parallel job would run unless we bounce the server(I have a dummy job that tests this with a rowgen going to a peek). We are currently bouncing the server 3 to 4 times a day to allow our ETL processes to run in production.

This has no affect on our server jobs, and a new PMR has been opened with IBM. We are getting nowhere with IBM, only the suggestion that upgrading to 8.5 may resolve the issue. They had us turn the McAfee Virus scan off on certain directories thingking that may be the culprit but that did not help.

I have read the other posts for this error and did not find much.

Any suggestions would be appreciated.
Thanks - -John
mhester
Participant
Posts: 622
Joined: Tue Mar 04, 2003 5:26 am
Location: Phoenix, AZ
Contact:

Post by mhester »

John,

I might have IBM support focus on the dsdlockd process. This can cause hangs like what you are experiencing. This may not be your issue, but worth the time to investigate.

We had this very issue and was related to the ownership of the dsdlockd process. We are Unix, but maybe this is a Windows issue too.

Worth a shot.
JPalatianos
Premium Member
Premium Member
Posts: 306
Joined: Wed Jun 21, 2006 11:41 am

Post by JPalatianos »

Mike,
Thank you for that information. I will let IBM know and see what they come back with.
Thanks - - John
mhester
Participant
Posts: 622
Joined: Tue Mar 04, 2003 5:26 am
Location: Phoenix, AZ
Contact:

Post by mhester »

You are most welcome - report back and let us know what they find.
JPalatianos
Premium Member
Premium Member
Posts: 306
Joined: Wed Jun 21, 2006 11:41 am

Post by JPalatianos »

IBM just got back with the following:

1) Patch being created to backport the environment variables - APT_PM_PLAYER_CONNECT_TIMEOUT, APT_PM_PLAYER_TIMEOUT - which will let the system keep running jobs even though it is overloaded
2) Get ETA for patch
3) Setup conference call with Prudential to explain what this patch will actually do and what suggestions we have going forward.


I will update when we have the patch available and installed.
Thanks - - John
mhester
Participant
Posts: 622
Joined: Tue Mar 04, 2003 5:26 am
Location: Phoenix, AZ
Contact:

Post by mhester »

John,

Thanks for the update!
JPalatianos
Premium Member
Premium Member
Posts: 306
Joined: Wed Jun 21, 2006 11:41 am

Post by JPalatianos »

The latest on this issue.....IBM has narrowed the problem to the MKS toolkit. They had us generate many logs for them to analyze and they should be getting back to us with a resolution.
JPalatianos
Premium Member
Premium Member
Posts: 306
Joined: Wed Jun 21, 2006 11:41 am

Post by JPalatianos »

hi,
Per IBM we have appled an upgrade to our MKS toolkit (Via patch supplied to us by IBM) and 2 weeks later it seems to have taken care of our orchestrate issues.
Thanks - - John
Post Reply