Detecting/Notifying Job Aborts

Archive of postings to DataStageUsers@Oliver.com. This forum intended only as a reference and cannot be posted to.

Moderators: chulett, rschirm

Locked
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

Detecting/Notifying Job Aborts

Post by admin »

This is a topic for an orphaned message.
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

Post by admin »

It can not be done in an after-job routine, since this is not executed if the job has aborted (!). If, on the other hand, you are running the job that aborts from another job, via a job control routine, then the controlling job (routine) can test the exit status of the controlled job. There are 11 possible status values (see JOBCONTROL.H or the help on DSGetJobInfo). Based on the exit status (DSJS.RUNFAILED = aborted, but check also for DSJS.CRASHED), you can then do whatever you like. Send email using any command line interface to mail by calling the DSExecute subroutine with the first argument (shell) appropriate to your operating system (UNIX or NT).

> ----------
> From: Conder, Scott (Ivy Hill)[SMTP:Scott.Conder@ivyhill-wms.com]
> Reply To: informix-datastage@oliver.com
> Sent: Tuesday, 31 October 2000 00:51
> To: informix-datastage@oliver.com
> Subject: Job Aborts
>
> Im looking for some way for DataStage or an "after job subroutine" to
> send an e-mail notification if the job being run aborts. I almost
> expected that
> to be a built-in function, but apparently not because when I called
> support,
> they acted like nobody had ever thought of it before. I know there are a
> bunch of possibilities. Any suggestions?
>
> Scott Conder
> Ivy Hill Corporation
> Louisville, KY
> 502.458.5303 ext. 4045
>
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

Post by admin »

Our standard overnight processing includes sending email to our support staff and to a paging service if certain critical jobs fail, do not start within a specified time or do not complete within the required time. We do all this in a Windows NT environment (but would work just the same in Unix).

As Ray describes, it is all done in the controlling job(s).

If you want more detail, let me know.

-----Original Message-----
From: Ray Wurlod [SMTP:ray.wurlod@informix.com]
Sent: Tuesday, October 31, 2000 7:18 AM
To: informix-datastage@oliver.com
Subject: RE: Detecting/Notifying Job Aborts

It can not be done in an after-job routine, since this is not executed if the job has aborted (!). If, on the other hand, you are running the job that aborts from another job, via a job control routine, then the controlling job (routine) can test the exit status of the controlled job. There are 11 possible status values (see JOBCONTROL.H or the help on DSGetJobInfo). Based on the exit status (DSJS.RUNFAILED = aborted, but check also for DSJS.CRASHED), you can then do whatever you like. Send email using any command line interface to mail by calling the DSExecute subroutine with the first argument (shell) appropriate to your operating system (UNIX or NT).

> ----------
> From: Conder, Scott (Ivy Hill)[SMTP:Scott.Conder@ivyhill-wms.com]
> Reply To: informix-datastage@oliver.com
> Sent: Tuesday, 31 October 2000 00:51
> To: informix-datastage@oliver.com
> Subject: Job Aborts
>
> Im looking for some way for DataStage or an "after job subroutine" to
> send an e-mail notification if the job being run aborts. I almost
> expected that
> to be a built-in function, but apparently not because when I called
> support,
> they acted like nobody had ever thought of it before. I know there are a
> bunch of possibilities. Any suggestions?
>
> Scott Conder
> Ivy Hill Corporation
> Louisville, KY
> 502.458.5303 ext. 4045
>


*************************************************************************
This e-mail and any files transmitted with it may be confidential and are intended solely for the use of the individual or entity to whom they are addressed. If you have received this e-mail in
error, please notify the sender by return e-mail, and delete this e-mail from your in-box. Do not copy it to anybody else

*************************************************************************
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

Post by admin »

Dave,

Of course we would like the detail. ;-)

Brad Vincent
Compuware

> -----Original Message-----
> From: David Barham [SMTP:David.Barham@Anglocoal.com.au]
> Sent: Monday, October 30, 2000 6:49 PM
> To: informix-datastage@oliver.com
> Subject: RE: Detecting/Notifying Job Aborts
>
> Our standard overnight processing includes sending email to our
> support staff and to a paging service if certain critical jobs fail,
> do not start within a specified time or do not complete within the
> required time. We do all this in a Windows NT environment (but would
> work just the same in Unix).
>
> As Ray describes, it is all done in the controlling job(s).
>
> If you want more detail, let me know.
>
>
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

Post by admin »

Yeah, Im still just a little confused. If you could be a little more specific, that would be great. For instance, do you have one job that you pass other job names to as parameter which are then run from custom routines (or after-job subroutines, or transformer functions, or what) using the "dsjob -run" command?

Then, what command line program are you using to generate and send the e-mail? I have heard of a product called "blat" which does just that...do you recommend something different?

All help is much appreciated. Thanks

Scott


-----Original Message-----
From: Vincent, Brad [mailto:BVincent@dmc.org]
Sent: Tuesday, October 31, 2000 7:48 AM
To: informix-datastage@oliver.com
Subject: RE: Detecting/Notifying Job Aborts

Dave,

Of course we would like the detail. ;-)

Brad Vincent
Compuware
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

Post by admin »

Now Ive opened a real can of worms. Id probably, no, make that definitely, be overstepping the mark by publishing the 700 odd lines of job control routine (not counting many supporting functions for logging errors, sending email etc), so the best approach is probably to cover the main features of it.

A quick synopsis.

* We have a standard job control routine that is called from every
controlling job. This allows us to make modifications that can immediately be applied to all controlling jobs. Everything that follows here is about this routine.
* The main parameter to the routine (might be a function, actually,
but anyway), is the name of the job "suite" we wish it to run.
* The job suites are defined in our Oracle data base. Details for
each job include the job name, parameters, what other jobs must run first, what other jobs must actually work first, what time the job should start (if time dependent, very few are), the "criticality" of the job and so on (there is more).
* This routine loads the list of jobs from the database (using BCI -
refer earlier posts by Ray) into an array and loops over this array checking for jobs which can start, have finished, etc taking appropriate action. I never call DSWaitForJob. I always check the status of the job periodically (about every 60 seconds). As one job completes others can start etc. In hind sight, if I was doing it again, I would keep the working data in a some sort of database table, UniVerse file or whatever to better facilitate recovery from failures.
* We have a standard routine which passes common parameters from a
controlling job to a controlled job. This routine also sets some standard parameter values.
* If a job fails, I call another function which takes actions
depending on the criticality of the job. This includes logging the error to our error table in Oracle and possibly sending an email message. There are any number of command line mail programs available for this. I dont have any particular recommendations. Of course, if you are using Unix rather than NT then this is not an issue.
* Controlling jobs can also call other controlling jobs. At some
stages, job are nested about 4 or 5 levels deep.

There is no difficult technology in this. It is just a matter of the whole process being data driven. The significant investment in this infrastructure has certainly made the whole overnight suite (about 200 to 250 jobs) far more manageable. For example, we needed to run a cut down version of the overnight suite on Saturday nights. It took about 15 minutes of cutting and pasting rows to build the new suite definition. Then we just had to schedule the overnight job with a different "suite" parameter on that one night.

I should point out that all this is done in version 3.5. I have no idea what wonderful new job control features await when I install version 4.

I hope all this helps. Im trying to walk the fine line between sharing what we have learnt about DataStage and protecting Anglos intellectual property.


-----Original Message-----
From: Conder, Scott (Ivy Hill) [SMTP:Scott.Conder@ivyhill-wms.com]
Sent: Tuesday, October 31, 2000 11:32 PM
To: informix-datastage@oliver.com
Subject: RE: Detecting/Notifying Job Aborts

Yeah, Im still just a little confused. If you could be a little more specific, that would be great. For instance, do you have one job that you pass other job names to as parameter which are then run from custom routines (or after-job subroutines, or transformer functions, or what) using the "dsjob -run" command?

Then, what command line program are you using to generate and send the e-mail? I have heard of a product called "blat" which does just that...do you recommend something different?

All help is much appreciated. Thanks

Scott


-----Original Message-----
From: Vincent, Brad [mailto:BVincent@dmc.org]
Sent: Tuesday, October 31, 2000 7:48 AM
To: informix-datastage@oliver.com
Subject: RE: Detecting/Notifying Job Aborts

Dave,

Of course we would like the detail. ;-)

Brad Vincent
Compuware




*************************************************************************
This e-mail and any files transmitted with it may be confidential and are intended solely for the use of the individual or entity to whom they are addressed. If you have received this e-mail in
error, please notify the sender by return e-mail, and delete this e-mail from your in-box. Do not copy it to anybody else

*************************************************************************
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

Post by admin »

Wow...thanks. Thats more information than Id hoped for! Yes, thats exactly what I was looking for, a logical outline to the process. At least now I realize its not going to be a simple little 4 or 5 line after-job subroutine. Thank you so much for your detailed outline David...Im sure it will come in very handy while I tackle this project.

Scott

-----Original Message-----
From: David Barham [mailto:David.Barham@Anglocoal.com.au]
Sent: Tuesday, October 31, 2000 6:22 PM
To: informix-datastage@oliver.com
Subject: RE: Detecting/Notifying Job Aborts

Now Ive opened a real can of worms. Id probably, no, make that definitely, be overstepping the mark by publishing the 700 odd lines of job control routine (not counting many supporting functions for logging errors, sending email etc), so the best approach is probably to cover the main features of it.

A quick synopsis.

* We have a standard job control routine that is called from every
controlling job. This allows us to make modifications that can immediately be applied to all controlling jobs. Everything that follows here is about this routine.
* The main parameter to the routine (might be a function, actually,
but anyway), is the name of the job "suite" we wish it to run.
* The job suites are defined in our Oracle data base. Details for
each job include the job name, parameters, what other jobs must run first, what other jobs must actually work first, what time the job should start (if time dependent, very few are), the "criticality" of the job and so on (there is more).
* This routine loads the list of jobs from the database (using BCI -
refer earlier posts by Ray) into an array and loops over this array checking for jobs which can start, have finished, etc taking appropriate action. I never call DSWaitForJob. I always check the status of the job periodically (about every 60 seconds). As one job completes others can start etc. In hind sight, if I was doing it again, I would keep the working data in a some sort of database table, UniVerse file or whatever to better facilitate recovery from failures.
* We have a standard routine which passes common parameters from a
controlling job to a controlled job. This routine also sets some standard parameter values.
* If a job fails, I call another function which takes actions
depending on the criticality of the job. This includes logging the error to our error table in Oracle and possibly sending an email message. There are any number of command line mail programs available for this. I dont have any particular recommendations. Of course, if you are using Unix rather than NT then this is not an issue.
* Controlling jobs can also call other controlling jobs. At some
stages, job are nested about 4 or 5 levels deep.

There is no difficult technology in this. It is just a matter of the whole process being data driven. The significant investment in this infrastructure has certainly made the whole overnight suite (about 200 to 250 jobs) far more manageable. For example, we needed to run a cut down version of the overnight suite on Saturday nights. It took about 15 minutes of cutting and pasting rows to build the new suite definition. Then we just had to schedule the overnight job with a different "suite" parameter on that one night.

I should point out that all this is done in version 3.5. I have no idea what wonderful new job control features await when I install version 4.

I hope all this helps. Im trying to walk the fine line between sharing what we have learnt about DataStage and protecting Anglos intellectual property.


-----Original Message-----
From: Conder, Scott (Ivy Hill) [SMTP:Scott.Conder@ivyhill-wms.com]
Sent: Tuesday, October 31, 2000 11:32 PM
To: informix-datastage@oliver.com
Subject: RE: Detecting/Notifying Job Aborts

Yeah, Im still just a little confused. If you could be a little more specific, that would be great. For instance, do you have one job that you pass other job names to as parameter which are then run from custom routines (or after-job subroutines, or transformer functions, or what) using the "dsjob -run" command?

Then, what command line program are you using to generate and send the e-mail? I have heard of a product called "blat" which does just that...do you recommend something different?

All help is much appreciated. Thanks

Scott


-----Original Message-----
From: Vincent, Brad [mailto:BVincent@dmc.org]
Sent: Tuesday, October 31, 2000 7:48 AM
To: informix-datastage@oliver.com
Subject: RE: Detecting/Notifying Job Aborts

Dave,

Of course we would like the detail. ;-)

Brad Vincent
Compuware




*************************************************************************
This e-mail and any files transmitted with it may be confidential and are intended solely for the use of the individual or entity to whom they are addressed. If you have received this e-mail in
error, please notify the sender by return e-mail, and delete this e-mail from your in-box. Do not copy it to anybody else

*************************************************************************
Locked