Control Job Aborts Unexpectedly - DSD.WriteLog

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
nick.bond
Charter Member
Charter Member
Posts: 230
Joined: Thu Jan 15, 2004 12:00 pm
Location: London

Control Job Aborts Unexpectedly - DSD.WriteLog

Post by nick.bond »

Hi,

We have a job "ExecJobJC" which calls a routine ExecJob to run other jobs.

It has been working fine for a very long time and suddenly as we are performing the real migration it is falling over when trying to call some jobs. It is intermittent, if the the job is restarted after the abort it will usually run fine.

When the ExecJobJC is reset the following message is found in the log:
From previous run
DataStage Job 9 Phantom 445
Job Aborted after Fatal Error logged.
Program "DSD.WriteLog": Line 250, Abort.
Attempting to Cleanup after ABORT raised in stage ExecJobJC.TfmGemSF_ACCOUNT_NPBA_0000_08.JobControl

DataStage Phantom Aborting with @ABORT.CODE = 1
Things I have already tried to fix the issue:
  • Stopped DS Server, cleared sockets and restarted
    Re-promoted job and routine so they would be compiled again.
    Cleared log file for the job. "CLEAR.FILE RT_LOG9"
    Cleared &PH&
We are running the same process as we have done manly times with no changes to this code or the setup of the machine.

From reading post on on DSXchange I have only one last thought as to what it might be and that would be that too many files are open, but our TFILES = 2500.

Is there some way I check how many files are currently open?

Any more suggestions as to what I could try?

Thanks, Nick.
Regards,

Nick.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The number of dynamic hashed files open on the system is returned by the command

Code: Select all

$DSHOME/bin/analyze.shm -d | wc -l
That DSD.WriteLog() failed would ordinarily suggest that the log is the culprit, but your CLEAR.FILE should have fixed most known log problems.
The obvious question to ask is "what's changed?" since this job worked. I am guessing, from your low job number, that you don't have an issue with too many jobs (and therefore sub-directories) in the project.
Is it just this job, or all jobs? Is it just this project, or all projects on this machine, that manifest this problem?
Since DSD.WriteLog() is part of DataStage you will need to involve your support provider, just in case you've uncovered a bug.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Re: Control Job Aborts Unexpectedly - DSD.WriteLog

Post by chulett »

nick.bond wrote:It has been working fine for a very long time and suddenly as we are performing the real migration it is falling over when trying to call some jobs.
Could you explain that for us, please?
-craig

"You can never have too many knives" -- Logan Nine Fingers
nick.bond
Charter Member
Charter Member
Posts: 230
Joined: Thu Jan 15, 2004 12:00 pm
Location: London

Post by nick.bond »

This code has not changed since we ran a full system regression test last Wednesday with exactly the same code taken from version control.

As this is the production cutover I can't run trials in other projects so can't check if it is just this project or not.

There are many jobs (multi-instance) that run at the same time but this has been the same for months of testing.

It is just this job that is showing the issue, but then this is the only 'control' job we have.

I have contacted support but because of the time the response has not been good so we'll have to wait until the morning for anything worthwhile.

We have now finished the main transform so it isn't until midday that we have another period of heavy transform to perform. Even if we can work around it I am still interested in getting to the bottom of what has caused this in case it ever happens on another project.

Thanks for help, will post any further news if I get it.

Nick.
nick.bond
Charter Member
Charter Member
Posts: 230
Joined: Thu Jan 15, 2004 12:00 pm
Location: London

Post by nick.bond »

Craig,

We are performing a migration for the client, up until now we have performed many practice runs successfully and now that we are performing the migration for real we have this issue.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Nick - let us see what is happening at line 250 of your DSD.WriteLog program and that might give a hint as to the cause.

a) Enter TCL or the command line execution in the Administrator.
b) "LIST VOC DSD.WriteLog F2 F3" to get the program name and file name for this routine.
c) "VLIST DSD.BP {Basic Program Name}" - this will output the pseudocode. The 2nd or 3rd column is the source line number. Post several lines before and after to this thread. The Program name is probably DSD.WRITELOG_B or similar, but I'm not at a DataStage machine to check.
nick.bond
Charter Member
Charter Member
Posts: 230
Joined: Thu Jan 15, 2004 12:00 pm
Location: London

Post by nick.bond »

I can get the name of the program with

Code: Select all

LIST VOC "DSD.WriteLog" F2 F3
and the column F2 is
DSD_BP.O/DSD_WriteLog.B
But using different variations of the second command you provided I can't get the desired output. Can anyone help with the correct syntax?

Thanks, Nick.
Regards,

Nick.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Code: Select all

VLIST DSD_BP DSD_WriteLog.B
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Problems with DSD.WriteLog() might be in the hashed file itself, or the key value supplied (indirectly), rather than in the code. This function has been around since version 1.0, so I do suspect that that's the case. Is your control a job sequence, or a job control routine? If the latter, can you identify any calls to DSLogInfo(), DSLogWarn() or DSLogFatal() therein? If the former, you need to look at the generated job control code associated with any Routine activity.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

My thought is that if the error occurs during a write statement it is probably due to bad data (a null in the key, for example) otherwise the pseudocode statement will help pinpoint the cause.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Yes, but with DSLogInfo() and its ilk we have no control over the key. Presumably DSD.WriteLog gets the next key (event #) value from the control record //SEQUENCE.NO and increments that. Yes, looking at the VLIST output will verify that supposition.
Last edited by ray.wurlod on Mon Oct 08, 2007 4:47 am, edited 1 time in total.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
nick.bond
Charter Member
Charter Member
Posts: 230
Joined: Thu Jan 15, 2004 12:00 pm
Location: London

Post by nick.bond »

Thanks for replies. I will check these ideas when I get back to work.

Regards, Nick.
Regards,

Nick.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The subroutine that writes to the log runs from line 265 (address 0x684) to line 288 (address 0x78c). As you can see, it uses the value from the control record //SEQUENCE.NO - which is an integer. So I don't think that's the problem.

The problem may be in one of the other routines invoked in DSD.WriteLog. Look at the subr instructions to see which these are. However none of these is a user-written routine.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
nick.bond
Charter Member
Charter Member
Posts: 230
Joined: Thu Jan 15, 2004 12:00 pm
Location: London

Post by nick.bond »

Some more info:

The design.

Multi-Instance Job (ExecJobJC) Calls Routine (ExecJob) which sets parameters and calls other jobs.

ExecJobJC is not a sequence but a server job written with Job Control.

A selection of log messages from ExecJobJC when the error occurs are:
ExecJobJC -> (TfmGemSF_CUG11_ACTU.0000_17): Job run requested
Mode (row/warn limits) = 0/50
Job Parameters --->
pParam1 = ??
pParam2 = ??
...
DSJobController=ExecJobJC
ExecJobJC.TfmGemSF_CUG11_ACTU_0000_17.JobControl (DSRunJob): Waiting for job TfmGemSF_CUG11_ACTU.0000_17 to start
ExecJobJC.TfmGemSF_CUG11_ACTU_0000_17.JobControl (fatal error from ExecJobJC): Error when trying to run job "TfmGemSF_CUG11_ACTU.0000_17"
Attempting to Cleanup after ABORT raised in stage ExecJobJC.TfmGemSF_CUG11_ACTU_0000_17.JobControl
Job ExecJobJC.TfmGemSF_CUG11_ACTU_0000_17 aborted.
An extract from the routine ExecJob is

Code: Select all

      * run the job
      *
      vErrCode = DSRunJob(vhJob, DSJ.RUNNORMAL)



      If vErrCode <> 0 Then
         vMessage = "Error when trying to run job ":Quote(aExecJobName)
         Call DSLogFatal(vMessage, DSJobName)
      End 
It would appear the problem is with the call to DSRunJob. Unfortunately the code doesn't log the error code of DSRunJob and I can't hack it at the moment.

When the job is reset and I get
From previous run
DataStage Job 9 Phantom 3241
Job Aborted after Fatal Error logged.
Program "DSD.WriteLog": Line 250, Abort.
Attempting to Cleanup after ABORT raised in stage ExecJobJC.TfmGemSF_CUG11_ACTU_0000_17.JobControl
Is this simply because there has been a DSLogFatal called? or is this hinting at the issue DSRunJob had?
Post Reply