Job control process (pid xxxxx) has failed

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Manfred Hagedorn
Participant
Posts: 58
Joined: Wed Apr 04, 2007 10:02 am

Job control process (pid xxxxx) has failed

Post by Manfred Hagedorn »

Sometimes (e.g. today) jobs fail with the message:
Job control process (pid xxxxx) has failed.

I have searched through this forum, have read dozends of posts. But nothing really gave me an answer.

Let me desribe what i see:
Today 3 Jobs show this problem. It is the first run of this jobs after re-starting Datastage yesterday.
In the past, most times just a re-start of Datastage solved this problem. Today a further re-start doesn't help.
Sometimes even a re-compile of the affected jobs solved it. Today this doesn't help.
Finally i renamed the job and then it worked for one job.
For the other jobs even a re-name didn't help.
The failed jobs are part of a sequence. When starting the sequence with other parameters, so that one of the pre-processing jobs is not executed, then it works.
Also, when i start the job with a different AptConig (less nodes) it works (sometimes).
I also already removed some old files from &PH&, but only a few very old

You see, i have tested a lot !!! But don't really understand was is behind ????

Manfred
Manfred Hagedorn
Participant
Posts: 58
Joined: Wed Apr 04, 2007 10:02 am

Post by Manfred Hagedorn »

Hello,
after further testing, meanwhile my most favorite approach seems to be a change of AptConfig (means a change/reduction of nodes).
Can anybody agree to this and tell we why???
Manfred
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

There should be other messages in the log file which hint at the cause of the problem, the message you posted is just documenting an effect. How heavily loaded is your system, are you getting timeouts in DataStage?
Manfred Hagedorn
Participant
Posts: 58
Joined: Wed Apr 04, 2007 10:02 am

Post by Manfred Hagedorn »

Hello,
no, we don't get timeouts. System is runing fine since months. Only this problem exists.
For your information a put here some output of the logs form &PH&:

DataStage Job 2338 Phantom 570
CRITICAL ERROR! Notify the system administrator.

DataStage Job 2330 Phantom 15605
Program "DSD.RUN": Line 2220, Variable previously undefined. Zero length string used.
[15674] Done : DSD.RUN JB_SLIM_Get_MIS_Date.Get_MIS_Date_DRI_276_ED_INV_CUSTODY_ACT 0/50/1/0/0
jobnotify: Unknown error
Job Aborted after Fatal Error logged.
Program "DSD.WriteLog": Line 250, Abort.
[16576] DSD.RUN JB_INV_CUSTODY_ACT. 0/50/1/0/0 - core dumped.
Attempting to Cleanup after ABORT raised in stage SEQ_SLIM_INV_CUSTODY_ACT..JobControl

DataStage Phantom Aborting with @ABORT.CODE = 1

DataStage Job 2333 Phantom 29195
Program "DSD.RUN": Line 2220, Variable previously undefined. Zero length string used.
[29212] Done : DSD.RUN JB_SLIM_Get_MIS_Date.Get_MIS_Date_DRI_276_ED_INV_CUSTODY_ACT 0/50/1/0/0
jobnotify: Unknown error
Job Aborted after Fatal Error logged.
Program "DSD.WriteLog": Line 250, Abort.
[570] DSD.RUN JB_INV_CUSTODY_ACTx. 0/50/1/0/0 - core dumped.
Attempting to Cleanup after ABORT raised in stage SEQ_SLIM_INV_CUSTODY_ACT..JobControl

DataStage Phantom Aborting with @ABORT.CODE = 1
bhargav_dd
Premium Member
Premium Member
Posts: 57
Joined: Tue Jun 30, 2009 9:38 am

Post by bhargav_dd »

Manfred Hagedorn wrote:Hello,
no, we don't get timeouts. System is runing fine since months. Only this problem exists.
For your information a put here some output of the logs form &PH&:

DataStage Job 2338 Phantom 570
CRITICAL ERROR! Notify the system administrator.

DataStage Job 2330 Phantom 15605
Program "DSD.RUN": Line 2220, Variable previously undefined. Zero length string used.
[15674] Done : DSD.RUN JB_SLIM_Get_MIS_Date.Get_MIS_Date_DRI_276_ED_INV_CUSTODY_ACT 0/50/1/0/0
jobnotify: Unknown error
Job Aborted after Fatal Error logged.
Program "DSD.WriteLog": Line 250, Abort.
[16576] DSD.RUN JB_INV_CUSTODY_ACT. 0/50/1/0/0 - core dumped.
Attempting to Cleanup after ABORT raised in stage SEQ_SLIM_INV_CUSTODY_ACT..JobControl

DataStage Phantom Aborting with @ABORT.CODE = 1

DataStage Job 2333 Phantom 29195
Program "DSD.RUN": Line 2220, Variable previously undefined. Zero length string used.
[29212] Done : DSD.RUN JB_SLIM_Get_MIS_Date.Get_MIS_Date_DRI_276_ED_INV_CUSTODY_ACT 0/50/1/0/0
jobnotify: Unknown error
Job Aborted after Fatal Error logged.
Program "DSD.WriteLog": Line 250, Abort.
[570] DSD.RUN JB_INV_CUSTODY_ACTx. 0/50/1/0/0 - core dumped.
Attempting to Cleanup after ABORT raised in stage SEQ_SLIM_INV_CUSTODY_ACT..JobControl

DataStage Phantom Aborting with @ABORT.CODE = 1












hello try to use variable ps_debug variable set it to 1 and try running
Manfred Hagedorn
Participant
Posts: 58
Joined: Wed Apr 04, 2007 10:02 am

Post by Manfred Hagedorn »

Hello,
sorry, where to set PS_DEBUG?
And then?
Manfred
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

It is PX_DEBUG to be set to 1 in Administrator -> User Defined variables.
Manfred Hagedorn
Participant
Posts: 58
Joined: Wed Apr 04, 2007 10:02 am

Post by Manfred Hagedorn »

Hello,
thanks a lot!
I understand, that this parameter is, to get more log informations for this PID-Error.
As the jobs are currently running (anyhow), i will set this parameter when this issue comes up next time.

I will leave this open so long (and definitly update it later).

Manfred
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

The error regarding the previously undefined variable is an old one (I'm really surprised it isn't fixed yet), but the abort within the DSD.WriteLog shouldn't be happening.

If the mount point for your project directory is not close to full then it might be a corrupt log file, the easiest way to fix that is to save your file under another name, delete the original and then rename and recompile/run the copy.
Manfred Hagedorn
Participant
Posts: 58
Joined: Wed Apr 04, 2007 10:02 am

Post by Manfred Hagedorn »

Hello,
well, i set PX_DEBUG=1 and the job failure happend again.
But i don't see anything more in the logs now, or?
Manfred
Post Reply