PID Failed
Moderators: chulett, rschirm, roy
-
- Premium Member
- Posts: 71
- Joined: Mon Nov 13, 2006 12:40 am
PID Failed
Hi Everyone,
I am facing some problem with the PID of a job. When i am trying to run a sequencer in which i am having nearly 25 jobs. The first job in the sequencer is job1 which i am running first of all and no other job is running parallel to this job after this job completes i am trying to run job2 and job3 parallely. In case of job2 i am getting the PID failed error. The log for the sequencer looks like this:
Control Starting Job Job2 (...)
Warning Job control process (pid 112431) has failed
Control Job Job2.aborted
In order to get some solution from the IBM support we sent the description of thsi error to them and they are asking us to:
clear the &PH& folder.
We are not able to find this folder, at the same time we are not able to know why we are getting this PID failed error and what could be the possible solution for the same.
If anyone is having any information about the folder or this problem then please do provide some help on this.
Thanks in advance,
I am facing some problem with the PID of a job. When i am trying to run a sequencer in which i am having nearly 25 jobs. The first job in the sequencer is job1 which i am running first of all and no other job is running parallel to this job after this job completes i am trying to run job2 and job3 parallely. In case of job2 i am getting the PID failed error. The log for the sequencer looks like this:
Control Starting Job Job2 (...)
Warning Job control process (pid 112431) has failed
Control Job Job2.aborted
In order to get some solution from the IBM support we sent the description of thsi error to them and they are asking us to:
clear the &PH& folder.
We are not able to find this folder, at the same time we are not able to know why we are getting this PID failed error and what could be the possible solution for the same.
If anyone is having any information about the folder or this problem then please do provide some help on this.
Thanks in advance,
-
- Premium Member
- Posts: 71
- Joined: Mon Nov 13, 2006 12:40 am
Hi,
Thanks a lot for the quick reply.
I was able to find this directory, I am not the owner of the directory.Is there any particular way to clean this directory,sorry for asking something like this as i am not having much information about this directory.
The jobs which are getting aborted are not giving me anny other error apart from the PID failed warning and then the job gets aborted. I am running the jobs on 4-nodes.
Thanks again,
Thanks a lot for the quick reply.
I was able to find this directory, I am not the owner of the directory.Is there any particular way to clean this directory,sorry for asking something like this as i am not having much information about this directory.
The jobs which are getting aborted are not giving me anny other error apart from the PID failed warning and then the job gets aborted. I am running the jobs on 4-nodes.
Thanks again,
Actually, the &PH& project subdirectory can be cleared from UNIX, even when jobs are running.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Premium Member
- Posts: 71
- Joined: Mon Nov 13, 2006 12:40 am
Hi DSguru/ArndW,
I don't have any idea how to goto TCL,I am really very sorry for the same. Can you please guide me on this.
ArndW,
you were saying that we can clear this subdirectory from UNIX also even when the jobs are running.Please guide me on this also.
I am really very thankful for all the inputs that you are giving in.
Thanks a lot for helping me out.
I don't have any idea how to goto TCL,I am really very sorry for the same. Can you please guide me on this.
ArndW,
you were saying that we can clear this subdirectory from UNIX also even when the jobs are running.Please guide me on this also.
I am really very thankful for all the inputs that you are giving in.
Thanks a lot for helping me out.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Open your Administrator client. Select your project. Click on the Command button. This opens a window in which your can enter "TCL" commands. Enter the command CLEAR.FILE &PH& and await a response. Close the command window.
PID in this context means "process ID" - this is not the cause of the problem. The problem is that Job2 aborted, and the job control process (job sequence?), which was executing Job2 with process ID 112431, is reporting that fact to you.
PID in this context means "process ID" - this is not the cause of the problem. The problem is that Job2 aborted, and the job control process (job sequence?), which was executing Job2 with process ID 112431, is reporting that fact to you.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
You can use your UNIX "rm" command to remove files in that directory. Nothing untoward will happen if you delete the open file for a running job - except that any information about the running job will be lost. I would use a filter on that directory that just removes anything older than a day to sure.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Premium Member
- Posts: 71
- Joined: Mon Nov 13, 2006 12:40 am
Hi Ray/ArndW/DSguru,
Thanks to you all for all the help that you have been extending.
The problem is that this PID Failed error is not occurring in a single job only. As i wrote i have 25 jobs in the sequencer, so sometimes this error is coming in the first job, sometimes it is coming in the 5th job and sometimes in some other job. What i mean to say is that it can occur in any of the 25 jobs. Last time when i ran the sequencer again it gave me the same problem and the log for the sequencer looks like this:
Occurred: 2:21:32 PM On date: 12/5/2006 Type: Control
Event: Starting Job Seq1. (...)
Occurred: 2:21:32 PM On date: 12/5/2006 Type: Info
Event: Environment variable settings: (...)
Occurred: 2:21:33 PM On date: 12/5/2006 Type: Info
Event: Seq1..JobControl (@Coordinator): Starting new run of checkpointed Sequence job
Occurred: 2:21:33 PM On date: 12/5/2006 Type: RunJob
Event: Seq1 -> (Sybase_FINCAFL_Load_Job): Job run requested (...)
Occurred: 2:21:33 PM On date: 12/5/2006 Type: Info
Event: Seq1..JobControl (DSRunJob): Waiting for job Sybase_FINCAFL_Load_Job to start
Occurred: 2:21:34 PM On date: 12/5/2006 Type: Info
Event: Seq1..JobControl (DSWaitForJob): Waiting for job Sybase_FINCAFL_Load_Job to finish
Occurred: 2:29:29 PM On date: 12/5/2006 Type: Info
Event: Seq1..JobControl (DSWaitForJob): Job Sybase_FINCAFL_Load_Job has finished, status = 1 (Finished OK)
Occurred: 2:29:30 PM On date: 12/5/2006 Type: Info
Event: Seq1..JobControl (@Sybase_FINCAFL_Load_Job): Report on job: Sybase_FINCAFL_Load_Job (...)
Occurred: 2:29:30 PM On date: 12/5/2006 Type: Info
Event: Seq1..JobControl (@Sybase_FINCAFL_Load_Job): Checkpointed run of job 'Sybase_FINCAFL_Load_Job'
Occurred: 2:29:30 PM On date: 12/5/2006 Type: RunJob
Event: Seq1 -> (Sybase_FINHDR_Load_job): Job run requested (...)
Occurred: 2:29:30 PM On date: 12/5/2006 Type: Info
Event: Seq1..JobControl (DSRunJob): Waiting for job Sybase_FINHDR_Load_job to start
Occurred: 2:29:31 PM On date: 12/5/2006 Type: RunJob
Event: Seq1 -> (Sybase_FINASST_Load_Job): Job run requested (...)
Occurred: 2:29:31 PM On date: 12/5/2006 Type: Info
Event: Seq1..JobControl (DSRunJob): Waiting for job Sybase_FINASST_Load_Job to start
Occurred: 2:29:32 PM On date: 12/5/2006 Type: Info
Event: Seq1..JobControl (DSWaitForJob): Waiting for job Sybase_FINHDR_Load_job+Sybase_FINASST_Load_Job to finish
Occurred: 2:29:32 PM On date: 12/5/2006 Type: Info
Event: Seq1..JobControl (DSWaitForJob): Job Sybase_FINHDR_Load_job has finished, status = 3 (Aborted)
Occurred: 2:29:32 PM On date: 12/5/2006 Type: Warning
Event: Seq1..JobControl (@Sybase_FINHDR_Load_job): Job Sybase_FINHDR_Load_job did not finish OK, status = 'Aborted'
Occurred: 2:29:32 PM On date: 12/5/2006 Type: Info
Event: Seq1..JobControl (@Sybase_FINHDR_Load_job): Report on job: Sybase_FINHDR_Load_job (...)
Occurred: 2:29:32 PM On date: 12/5/2006 Type: Warning
Event: Seq1..JobControl (@Sybase_FINHDR_Load_job): Controller problem: Unhandled abort encountered in job Sybase_FINHDR_Load_job
Occurred: 2:29:32 PM On date: 12/5/2006 Type: Info
Event: Seq1..JobControl (@Sybase_FINHDR_Load_job): Will execute error activity: FailCase_EA
Occurred: 2:29:33 PM On date: 12/5/2006 Type: Info
Event: Seq1..JobControl (DSSendMail): Sent message to 'punardeeps@hcl.in,ashikm@hcl.in,amalarpova@hcl.in' (...)
Occurred: 2:29:33 PM On date: 12/5/2006 Type: Info
Event: Seq1..JobControl (@ExceptionMail_NA): Omitted checkpoint for call of routine 'DSSendMail'
Occurred: 2:29:33 PM On date: 12/5/2006 Type: Info
Event: Seq1..JobControl (DSWaitForJob): Waiting for job Sybase_FINASST_Load_Job to finish
Occurred: 2:29:33 PM On date: 12/5/2006 Type: Info
Event: Seq1..JobControl (DSWaitForJob): Job Sybase_FINASST_Load_Job has finished, status = 3 (Aborted)
Occurred: 2:29:33 PM On date: 12/5/2006 Type: Warning
Event: Seq1..JobControl (@Sybase_FINASST_Load_Job): Job Sybase_FINASST_Load_Job did not finish OK, status = 'Aborted'
Occurred: 2:29:33 PM On date: 12/5/2006 Type: Info
Event: Seq1..JobControl (@Sybase_FINASST_Load_Job): Report on job: Sybase_FINASST_Load_Job (...)
Occurred: 2:29:33 PM On date: 12/5/2006 Type: Info
Event: Seq1..JobControl (@Coordinator): Summary of sequence run (...)
Occurred: 2:29:33 PM On date: 12/5/2006 Type: Fatal
Event: Seq1..JobControl (fatal error from @Coordinator): Sequence job (restartable) will abort due to previous unrecoverable errors
Occurred: 2:29:33 PM On date: 12/5/2006 Type: Warning
Event: Attempting to Cleanup after ABORT raised in stage Seq1..JobControl
Occurred: 2:29:33 PM On date: 12/5/2006 Type: Control
Event: Job Seq1 aborted.
End of report.
So 2 jobs got the PID Failed error, the log for the first job(Sybase_FINHDR_Load_job) looks like this:
Occurred: 2:29:30 PM On date: 12/5/2006 Type: Control
Event: Starting Job Sybase_FINHDR_Load_job. (...)
Occurred: 2:29:32 PM On date: 12/5/2006 Type: Warning
Event: Job control process (pid 164150) has failed
Occurred: 2:29:32 PM On date: 12/5/2006 Type: Control
Event: Job Sybase_FINHDR_Load_job. aborted
and the log for the second job(Sybase_FINASST_Load_Job) looks like this:
Occurred: 2:29:32 PM On date: 12/5/2006 Type: Control
Event: Starting Job Sybase_FINASST_Load_Job. (...)
Occurred: 2:29:33 PM On date: 12/5/2006 Type: Warning
Event: Job control process (pid 1781864) has failed
Occurred: 2:29:33 PM On date: 12/5/2006 Type: Control
Event: Job Sybase_FINASST_Load_Job. aborted
End of report.
I am sorry for posting such a long post.But,I think with all this description i will be able to explain my problem. The main issue is that i am getting the PID failed error randomly in jobs. one time it is in one job and the other time it is in some other job. Can you please tell me what could be the possible reason for the same? I am not able to solve this thing from quite some time. I believe with your help I will be able to get through the same.
Thanks a lot for all your help.
Thanks to you all for all the help that you have been extending.
The problem is that this PID Failed error is not occurring in a single job only. As i wrote i have 25 jobs in the sequencer, so sometimes this error is coming in the first job, sometimes it is coming in the 5th job and sometimes in some other job. What i mean to say is that it can occur in any of the 25 jobs. Last time when i ran the sequencer again it gave me the same problem and the log for the sequencer looks like this:
Occurred: 2:21:32 PM On date: 12/5/2006 Type: Control
Event: Starting Job Seq1. (...)
Occurred: 2:21:32 PM On date: 12/5/2006 Type: Info
Event: Environment variable settings: (...)
Occurred: 2:21:33 PM On date: 12/5/2006 Type: Info
Event: Seq1..JobControl (@Coordinator): Starting new run of checkpointed Sequence job
Occurred: 2:21:33 PM On date: 12/5/2006 Type: RunJob
Event: Seq1 -> (Sybase_FINCAFL_Load_Job): Job run requested (...)
Occurred: 2:21:33 PM On date: 12/5/2006 Type: Info
Event: Seq1..JobControl (DSRunJob): Waiting for job Sybase_FINCAFL_Load_Job to start
Occurred: 2:21:34 PM On date: 12/5/2006 Type: Info
Event: Seq1..JobControl (DSWaitForJob): Waiting for job Sybase_FINCAFL_Load_Job to finish
Occurred: 2:29:29 PM On date: 12/5/2006 Type: Info
Event: Seq1..JobControl (DSWaitForJob): Job Sybase_FINCAFL_Load_Job has finished, status = 1 (Finished OK)
Occurred: 2:29:30 PM On date: 12/5/2006 Type: Info
Event: Seq1..JobControl (@Sybase_FINCAFL_Load_Job): Report on job: Sybase_FINCAFL_Load_Job (...)
Occurred: 2:29:30 PM On date: 12/5/2006 Type: Info
Event: Seq1..JobControl (@Sybase_FINCAFL_Load_Job): Checkpointed run of job 'Sybase_FINCAFL_Load_Job'
Occurred: 2:29:30 PM On date: 12/5/2006 Type: RunJob
Event: Seq1 -> (Sybase_FINHDR_Load_job): Job run requested (...)
Occurred: 2:29:30 PM On date: 12/5/2006 Type: Info
Event: Seq1..JobControl (DSRunJob): Waiting for job Sybase_FINHDR_Load_job to start
Occurred: 2:29:31 PM On date: 12/5/2006 Type: RunJob
Event: Seq1 -> (Sybase_FINASST_Load_Job): Job run requested (...)
Occurred: 2:29:31 PM On date: 12/5/2006 Type: Info
Event: Seq1..JobControl (DSRunJob): Waiting for job Sybase_FINASST_Load_Job to start
Occurred: 2:29:32 PM On date: 12/5/2006 Type: Info
Event: Seq1..JobControl (DSWaitForJob): Waiting for job Sybase_FINHDR_Load_job+Sybase_FINASST_Load_Job to finish
Occurred: 2:29:32 PM On date: 12/5/2006 Type: Info
Event: Seq1..JobControl (DSWaitForJob): Job Sybase_FINHDR_Load_job has finished, status = 3 (Aborted)
Occurred: 2:29:32 PM On date: 12/5/2006 Type: Warning
Event: Seq1..JobControl (@Sybase_FINHDR_Load_job): Job Sybase_FINHDR_Load_job did not finish OK, status = 'Aborted'
Occurred: 2:29:32 PM On date: 12/5/2006 Type: Info
Event: Seq1..JobControl (@Sybase_FINHDR_Load_job): Report on job: Sybase_FINHDR_Load_job (...)
Occurred: 2:29:32 PM On date: 12/5/2006 Type: Warning
Event: Seq1..JobControl (@Sybase_FINHDR_Load_job): Controller problem: Unhandled abort encountered in job Sybase_FINHDR_Load_job
Occurred: 2:29:32 PM On date: 12/5/2006 Type: Info
Event: Seq1..JobControl (@Sybase_FINHDR_Load_job): Will execute error activity: FailCase_EA
Occurred: 2:29:33 PM On date: 12/5/2006 Type: Info
Event: Seq1..JobControl (DSSendMail): Sent message to 'punardeeps@hcl.in,ashikm@hcl.in,amalarpova@hcl.in' (...)
Occurred: 2:29:33 PM On date: 12/5/2006 Type: Info
Event: Seq1..JobControl (@ExceptionMail_NA): Omitted checkpoint for call of routine 'DSSendMail'
Occurred: 2:29:33 PM On date: 12/5/2006 Type: Info
Event: Seq1..JobControl (DSWaitForJob): Waiting for job Sybase_FINASST_Load_Job to finish
Occurred: 2:29:33 PM On date: 12/5/2006 Type: Info
Event: Seq1..JobControl (DSWaitForJob): Job Sybase_FINASST_Load_Job has finished, status = 3 (Aborted)
Occurred: 2:29:33 PM On date: 12/5/2006 Type: Warning
Event: Seq1..JobControl (@Sybase_FINASST_Load_Job): Job Sybase_FINASST_Load_Job did not finish OK, status = 'Aborted'
Occurred: 2:29:33 PM On date: 12/5/2006 Type: Info
Event: Seq1..JobControl (@Sybase_FINASST_Load_Job): Report on job: Sybase_FINASST_Load_Job (...)
Occurred: 2:29:33 PM On date: 12/5/2006 Type: Info
Event: Seq1..JobControl (@Coordinator): Summary of sequence run (...)
Occurred: 2:29:33 PM On date: 12/5/2006 Type: Fatal
Event: Seq1..JobControl (fatal error from @Coordinator): Sequence job (restartable) will abort due to previous unrecoverable errors
Occurred: 2:29:33 PM On date: 12/5/2006 Type: Warning
Event: Attempting to Cleanup after ABORT raised in stage Seq1..JobControl
Occurred: 2:29:33 PM On date: 12/5/2006 Type: Control
Event: Job Seq1 aborted.
End of report.
So 2 jobs got the PID Failed error, the log for the first job(Sybase_FINHDR_Load_job) looks like this:
Occurred: 2:29:30 PM On date: 12/5/2006 Type: Control
Event: Starting Job Sybase_FINHDR_Load_job. (...)
Occurred: 2:29:32 PM On date: 12/5/2006 Type: Warning
Event: Job control process (pid 164150) has failed
Occurred: 2:29:32 PM On date: 12/5/2006 Type: Control
Event: Job Sybase_FINHDR_Load_job. aborted
and the log for the second job(Sybase_FINASST_Load_Job) looks like this:
Occurred: 2:29:32 PM On date: 12/5/2006 Type: Control
Event: Starting Job Sybase_FINASST_Load_Job. (...)
Occurred: 2:29:33 PM On date: 12/5/2006 Type: Warning
Event: Job control process (pid 1781864) has failed
Occurred: 2:29:33 PM On date: 12/5/2006 Type: Control
Event: Job Sybase_FINASST_Load_Job. aborted
End of report.
I am sorry for posting such a long post.But,I think with all this description i will be able to explain my problem. The main issue is that i am getting the PID failed error randomly in jobs. one time it is in one job and the other time it is in some other job. Can you please tell me what could be the possible reason for the same? I am not able to solve this thing from quite some time. I believe with your help I will be able to get through the same.
Thanks a lot for all your help.
-
- Premium Member
- Posts: 71
- Joined: Mon Nov 13, 2006 12:40 am
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Premium Member
- Posts: 71
- Joined: Mon Nov 13, 2006 12:40 am
Hi Ray,
For the job Sybase_FINHDR_Load_job, i am getting the following log,this is the full log for the job run:
Occurred: 2:29:30 PM On date: 12/5/2006 Type: Control
Event: Starting Job Sybase_FINHDR_Load_job. (...)
Occurred: 2:29:32 PM On date: 12/5/2006 Type: Warning
Event: Job control process (pid 164150) has failed
Occurred: 2:29:32 PM On date: 12/5/2006 Type: Control
Event: Job Sybase_FINHDR_Load_job. aborted
-----------------------------------------------------------
For the job Sybase_FINASST_Load_Job, i am getting the following log,this is the full log for the job run:
Occurred: 2:29:32 PM On date: 12/5/2006 Type: Control
Event: Starting Job Sybase_FINASST_Load_Job. (...)
Occurred: 2:29:33 PM On date: 12/5/2006 Type: Warning
Event: Job control process (pid 1781864) has failed
Occurred: 2:29:33 PM On date: 12/5/2006 Type: Control
Event: Job Sybase_FINASST_Load_Job. aborted
-----------------------------------------------------------
These are the logs for the 2 jobs which got aborted.
I hope i was able to give you the information whcih you asked for.
Thanks a lot for all your inputs.
For the job Sybase_FINHDR_Load_job, i am getting the following log,this is the full log for the job run:
Occurred: 2:29:30 PM On date: 12/5/2006 Type: Control
Event: Starting Job Sybase_FINHDR_Load_job. (...)
Occurred: 2:29:32 PM On date: 12/5/2006 Type: Warning
Event: Job control process (pid 164150) has failed
Occurred: 2:29:32 PM On date: 12/5/2006 Type: Control
Event: Job Sybase_FINHDR_Load_job. aborted
-----------------------------------------------------------
For the job Sybase_FINASST_Load_Job, i am getting the following log,this is the full log for the job run:
Occurred: 2:29:32 PM On date: 12/5/2006 Type: Control
Event: Starting Job Sybase_FINASST_Load_Job. (...)
Occurred: 2:29:33 PM On date: 12/5/2006 Type: Warning
Event: Job control process (pid 1781864) has failed
Occurred: 2:29:33 PM On date: 12/5/2006 Type: Control
Event: Job Sybase_FINASST_Load_Job. aborted
-----------------------------------------------------------
These are the logs for the 2 jobs which got aborted.
I hope i was able to give you the information whcih you asked for.
Thanks a lot for all your inputs.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Not much to go on there. Are Sybase_FINHDR_Load_job and Sybase_FINASST_Load_Job jobs or job sequences? In either case, please set APT_PM_SHOW_PIDS to True before executing again - that way you will be able to work out which process was executing what.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Hi All,
I had come across this sometime back, and the issue that we found causing this, was weird.
In my case, i had a sequence which was calling about 5 jobs - and when the Sequence was Run it would abort with a PID failure error - and sometimes it would run fine.
What we found was that - the environment variable - $APT_DUMP_SCORE - was set to TRUE in the job while as at a project level it was set to FALSE.
When the value was either set to TRUE at a project level, or set to FALSE in the job, the sequence would run fine and has been running fine (with the value set to FALSE in the job). They had opened a case with IBM regarding this, but not sure what happened of it later on.
Maybe its worth a try checking out if there are any such Environment variables in the job....
Aneesh
I had come across this sometime back, and the issue that we found causing this, was weird.
In my case, i had a sequence which was calling about 5 jobs - and when the Sequence was Run it would abort with a PID failure error - and sometimes it would run fine.
What we found was that - the environment variable - $APT_DUMP_SCORE - was set to TRUE in the job while as at a project level it was set to FALSE.
When the value was either set to TRUE at a project level, or set to FALSE in the job, the sequence would run fine and has been running fine (with the value set to FALSE in the job). They had opened a case with IBM regarding this, but not sure what happened of it later on.
Maybe its worth a try checking out if there are any such Environment variables in the job....
Aneesh