Job control process (pid xxxxx) has failed
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 58
- Joined: Wed Apr 04, 2007 10:02 am
Job control process (pid xxxxx) has failed
Sometimes (e.g. today) jobs fail with the message:
Job control process (pid xxxxx) has failed.
I have searched through this forum, have read dozends of posts. But nothing really gave me an answer.
Let me desribe what i see:
Today 3 Jobs show this problem. It is the first run of this jobs after re-starting Datastage yesterday.
In the past, most times just a re-start of Datastage solved this problem. Today a further re-start doesn't help.
Sometimes even a re-compile of the affected jobs solved it. Today this doesn't help.
Finally i renamed the job and then it worked for one job.
For the other jobs even a re-name didn't help.
The failed jobs are part of a sequence. When starting the sequence with other parameters, so that one of the pre-processing jobs is not executed, then it works.
Also, when i start the job with a different AptConig (less nodes) it works (sometimes).
I also already removed some old files from &PH&, but only a few very old
You see, i have tested a lot !!! But don't really understand was is behind ????
Manfred
Job control process (pid xxxxx) has failed.
I have searched through this forum, have read dozends of posts. But nothing really gave me an answer.
Let me desribe what i see:
Today 3 Jobs show this problem. It is the first run of this jobs after re-starting Datastage yesterday.
In the past, most times just a re-start of Datastage solved this problem. Today a further re-start doesn't help.
Sometimes even a re-compile of the affected jobs solved it. Today this doesn't help.
Finally i renamed the job and then it worked for one job.
For the other jobs even a re-name didn't help.
The failed jobs are part of a sequence. When starting the sequence with other parameters, so that one of the pre-processing jobs is not executed, then it works.
Also, when i start the job with a different AptConig (less nodes) it works (sometimes).
I also already removed some old files from &PH&, but only a few very old
You see, i have tested a lot !!! But don't really understand was is behind ????
Manfred
-
- Participant
- Posts: 58
- Joined: Wed Apr 04, 2007 10:02 am
There should be other messages in the log file which hint at the cause of the problem, the message you posted is just documenting an effect. How heavily loaded is your system, are you getting timeouts in DataStage?
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Participant
- Posts: 58
- Joined: Wed Apr 04, 2007 10:02 am
Hello,
no, we don't get timeouts. System is runing fine since months. Only this problem exists.
For your information a put here some output of the logs form &PH&:
DataStage Job 2338 Phantom 570
CRITICAL ERROR! Notify the system administrator.
DataStage Job 2330 Phantom 15605
Program "DSD.RUN": Line 2220, Variable previously undefined. Zero length string used.
[15674] Done : DSD.RUN JB_SLIM_Get_MIS_Date.Get_MIS_Date_DRI_276_ED_INV_CUSTODY_ACT 0/50/1/0/0
jobnotify: Unknown error
Job Aborted after Fatal Error logged.
Program "DSD.WriteLog": Line 250, Abort.
[16576] DSD.RUN JB_INV_CUSTODY_ACT. 0/50/1/0/0 - core dumped.
Attempting to Cleanup after ABORT raised in stage SEQ_SLIM_INV_CUSTODY_ACT..JobControl
DataStage Phantom Aborting with @ABORT.CODE = 1
DataStage Job 2333 Phantom 29195
Program "DSD.RUN": Line 2220, Variable previously undefined. Zero length string used.
[29212] Done : DSD.RUN JB_SLIM_Get_MIS_Date.Get_MIS_Date_DRI_276_ED_INV_CUSTODY_ACT 0/50/1/0/0
jobnotify: Unknown error
Job Aborted after Fatal Error logged.
Program "DSD.WriteLog": Line 250, Abort.
[570] DSD.RUN JB_INV_CUSTODY_ACTx. 0/50/1/0/0 - core dumped.
Attempting to Cleanup after ABORT raised in stage SEQ_SLIM_INV_CUSTODY_ACT..JobControl
DataStage Phantom Aborting with @ABORT.CODE = 1
no, we don't get timeouts. System is runing fine since months. Only this problem exists.
For your information a put here some output of the logs form &PH&:
DataStage Job 2338 Phantom 570
CRITICAL ERROR! Notify the system administrator.
DataStage Job 2330 Phantom 15605
Program "DSD.RUN": Line 2220, Variable previously undefined. Zero length string used.
[15674] Done : DSD.RUN JB_SLIM_Get_MIS_Date.Get_MIS_Date_DRI_276_ED_INV_CUSTODY_ACT 0/50/1/0/0
jobnotify: Unknown error
Job Aborted after Fatal Error logged.
Program "DSD.WriteLog": Line 250, Abort.
[16576] DSD.RUN JB_INV_CUSTODY_ACT. 0/50/1/0/0 - core dumped.
Attempting to Cleanup after ABORT raised in stage SEQ_SLIM_INV_CUSTODY_ACT..JobControl
DataStage Phantom Aborting with @ABORT.CODE = 1
DataStage Job 2333 Phantom 29195
Program "DSD.RUN": Line 2220, Variable previously undefined. Zero length string used.
[29212] Done : DSD.RUN JB_SLIM_Get_MIS_Date.Get_MIS_Date_DRI_276_ED_INV_CUSTODY_ACT 0/50/1/0/0
jobnotify: Unknown error
Job Aborted after Fatal Error logged.
Program "DSD.WriteLog": Line 250, Abort.
[570] DSD.RUN JB_INV_CUSTODY_ACTx. 0/50/1/0/0 - core dumped.
Attempting to Cleanup after ABORT raised in stage SEQ_SLIM_INV_CUSTODY_ACT..JobControl
DataStage Phantom Aborting with @ABORT.CODE = 1
-
- Premium Member
- Posts: 57
- Joined: Tue Jun 30, 2009 9:38 am
Manfred Hagedorn wrote:Hello,
no, we don't get timeouts. System is runing fine since months. Only this problem exists.
For your information a put here some output of the logs form &PH&:
DataStage Job 2338 Phantom 570
CRITICAL ERROR! Notify the system administrator.
DataStage Job 2330 Phantom 15605
Program "DSD.RUN": Line 2220, Variable previously undefined. Zero length string used.
[15674] Done : DSD.RUN JB_SLIM_Get_MIS_Date.Get_MIS_Date_DRI_276_ED_INV_CUSTODY_ACT 0/50/1/0/0
jobnotify: Unknown error
Job Aborted after Fatal Error logged.
Program "DSD.WriteLog": Line 250, Abort.
[16576] DSD.RUN JB_INV_CUSTODY_ACT. 0/50/1/0/0 - core dumped.
Attempting to Cleanup after ABORT raised in stage SEQ_SLIM_INV_CUSTODY_ACT..JobControl
DataStage Phantom Aborting with @ABORT.CODE = 1
DataStage Job 2333 Phantom 29195
Program "DSD.RUN": Line 2220, Variable previously undefined. Zero length string used.
[29212] Done : DSD.RUN JB_SLIM_Get_MIS_Date.Get_MIS_Date_DRI_276_ED_INV_CUSTODY_ACT 0/50/1/0/0
jobnotify: Unknown error
Job Aborted after Fatal Error logged.
Program "DSD.WriteLog": Line 250, Abort.
[570] DSD.RUN JB_INV_CUSTODY_ACTx. 0/50/1/0/0 - core dumped.
Attempting to Cleanup after ABORT raised in stage SEQ_SLIM_INV_CUSTODY_ACT..JobControl
DataStage Phantom Aborting with @ABORT.CODE = 1
hello try to use variable ps_debug variable set it to 1 and try running
-
- Participant
- Posts: 58
- Joined: Wed Apr 04, 2007 10:02 am
-
- Participant
- Posts: 3337
- Joined: Mon Jan 17, 2005 4:49 am
- Location: United Kingdom
-
- Participant
- Posts: 58
- Joined: Wed Apr 04, 2007 10:02 am
The error regarding the previously undefined variable is an old one (I'm really surprised it isn't fixed yet), but the abort within the DSD.WriteLog shouldn't be happening.
If the mount point for your project directory is not close to full then it might be a corrupt log file, the easiest way to fix that is to save your file under another name, delete the original and then rename and recompile/run the copy.
If the mount point for your project directory is not close to full then it might be a corrupt log file, the easiest way to fix that is to save your file under another name, delete the original and then rename and recompile/run the copy.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Participant
- Posts: 58
- Joined: Wed Apr 04, 2007 10:02 am