Aborting Job when a Semaphor is set

peternolan9 · Post by **peternolan9** » Wed Oct 13, 2004 8:26 am

Hi All,
I have a table called DW_BATCH_CONTROL which contains a field called BATCH_COMPLETE_FLAG.

The second thing a new batch does is insert a new row setting BATCH_COMPLETE_FLAG=0.

The first thing I want my batch to do is to perform a
select count(*) from DW_BATCH_CONTROL WHERE BATCH_COMPLETE_FLAG=0.

If it returns 0 then no batch is in progress and the entire batch can start.

If it returns a number greater than zero (usually 1) I know that there is a batch in progress and I should abort this effort to start the batch.

I want to do this so that in cases where a human is running something manually and it does not complete before the next scheduled batch it is not possible for DS to automatically start the next batch....

I know I can write the abort in DS control language...but how to I know the count returned a number greater than 0?

Has anyone set up a semaphore and an abort like this?

Thanks

kduke · Post by **kduke** » Wed Oct 13, 2004 8:49 am

Peter

Hester did something similar to this in is batch automation. It is pretty easy in BASIC using ODBC to get the results you wanted. You have to start all your jobs through this process otherwise you need to check to see if any job is running manually as well. You cannot force DataStage to not run jobs or to go through this step of checking the DW_BATCH_CONTROL table.

ray.wurlod · Post by **ray.wurlod** » Wed Oct 13, 2004 4:02 pm

If DW_BATCH_CONTROL is a hashed file (or UniVerse table) it's really easy. I suspect it's not, however.

The "least code" approach is to have a small job perform the SQL capturing the result into a text file, then "read" the text file. Sufficient would be

Code: Select all

Call DSExecute("UNIX", "head -1 filename", Output, ExitStatus)

The Output variable contains the number in which you're interested

Code: Select all

If Trim(Output<1>) = 0
Then
   * code to run the batch
End
Else
   * code to log message that batch not started because something happening
End

Of course, you need to ensure that any manually-run job as well as any automatically-run job updates DW_BATCH_CONTROL when it starts and when it ends. I normally ensure this with standard before-job and after-job subroutines.
You could, of course, code the whole thing using BCI (= ODBC API) functions, provided you have a licensed ODBC driver for the database where DW_BATCH_CONTROL lives.

mhester · Post by **mhester** » Wed Oct 13, 2004 4:35 pm

Kim,

Actually I do this check a bit differently (no ODBC here, just good old Retrieve).

This solution would use the following code (and would not filter on process) -

Code: Select all

OUTPUT = ""
RSTATUS = 0
CMD = "SELECT DW_BATCH_CONTROL WITH BATCH_COMPLETE_FLAG = 0"

Call DSExecute("UV", CMD, OUTPUT, RSTATUS)

IF RSTATUS THEN
     Cannot start batch
END ELSE
     Can start batch
END

Also, if you wanted to check a particular process or more than one process has completed (kind of a dependency check) in order to start a new batch then you could pass in a space delimited string of processes (PROCESSLIST) and change the CMD variable above to be -

Code: Select all


CMD = "SELECT DW_BATCH_CONTROL = ":PROCESSLIST:" WITH BATCH_COMPLETE_FLAG = 0"

If all of the processes are complete then this will return 0 else it will return however many are not complete. You could then even get fancy and do some reporting on how many are complete etc...

Regards,

ray.wurlod · Post by **ray.wurlod** » Wed Oct 13, 2004 4:41 pm

Of course, Michael's solution only works if DW_BATCH_CONTROL is a UniVerse table! As I noted earlier, I suspect that this would not be the case in one of Peter's implementations.

mhester · Post by **mhester** » Wed Oct 13, 2004 4:49 pm

Ray,

Very true and I forgot to mention this when I posted, but I also read the following in his post -

I have a table called DW_BATCH_CONTROL which contains a field called BATCH_COMPLETE_FLAG.

The second thing a new batch does is insert a new row setting BATCH_COMPLETE_FLAG=0.

The first thing I want my batch to do is to perform a
select count(*) from DW_BATCH_CONTROL WHERE BATCH_COMPLETE_FLAG=0.

to mean that there was a Universe table.

chulett · Post by **chulett** » Wed Oct 13, 2004 8:51 pm

Well... let me throw something out on the table. Something I setup way back in version 3 dot something or other when I wasn't strong in Universe Magic.

To Peter's original question, how to tell when polling for something that a number other than zero have been found. I setup something similar to what Ray suggested, done by a Server job that read a table looking for unprocessed rows. I didn't select a count, but rather selected the rows themselves and processed them through the job. It had to do more than just count, but that doesn't really matter here. The 'polling' job could simply select the rows and then not pass them through, or could write them to something like /dev/null. Or heck, it could even update the rows it finds and flag them as now (or shortly) being 'in progress'.

It doesn't really matter because all you would need to do would be to interrogate the job after it completes to see how many rows it selected. DSGetLinkInfo can get the number of rows down a particular link and then decisions can be made based on that number.

You could run the job inside a loop. If the count is non-zero, sleep X and then run the job again. If the count is zero, break out of the loop and run your batch. If it loops too long, give up. My little setup has been doing this daily for something like 4 years now. Sleep time and the number of times to loop parameters control the polling 'window'.

No BCI functions, no licensed ODBC drivers needed and the target could still be an Oracle table easily populated by other applications.

When you get right down to it, not all that different from what Ray proposed, I guess. Just thought it might be either another avenue to explore or something that might get the creative juices flowing.

peternolan9 · Post by **peternolan9** » Thu Oct 14, 2004 6:27 am

Ray,
you know me too well...

And you are right, the only place for data to live is in a database and DW_BATCH_CONTROL is, indeed, in Oracle...

However, it is trivial to copy it to a hash file at the start of the batch.....The way it looks like is best to me is just to copy the table to a hash file and perform the select as described....then crash the job if a zero row is found.....

I only really want to do this at startup of the overnight batch and not for each job...after all, if a job is being started in the middle it must be being started by a person and one would hope the person is qualified to be doing so...(perhaps a vain hope... LOL)...

ray.wurlod wrote:Of course, Michael's solution only works if DW_BATCH_CONTROL is a UniVerse table! As I noted earlier, I suspect that this would not be the case in one of Peter's implementations.

ray.wurlod · Post by **ray.wurlod** » Thu Oct 14, 2004 3:56 pm

My solution to the "manual" approach was cited earlier:
Of course, you need to ensure that any manually-run job as well as any automatically-run job updates DW_BATCH_CONTROL when it starts and when it ends. I normally ensure this with standard before-job and after-job subroutines.

Typically these routines are called "RecordJobStart" and "RecordJobFinish".

It was a matter of great joy to me when Ascential introduced the DSJ.JOBINTERIMSTATUS key to DSGetJobInfo() - it allows one to determine the status of the current job as if the job had completed and wasn't actually running its after-job subroutine.

The actual form of these two subroutines depends on whether the client has a licensed ODBC driver (in which case BCI functions can be used to update a database table directly) or, if not, whether the client expresses a preference for a local staging table (to be moved to the database subsequently or immediately by executing a standard job), or performing the update "directly" via a shell script (for example PL/SQL for Oracle).

peternolan9 · Post by **peternolan9** » Fri Oct 15, 2004 12:47 am

Hi Ray,
yes, I agree that a far better way of 'manual restarts' is required in DS...

I have had the unfortunate experience of a person manually correcting a batch stream after a 'running out of space' failure and messing up the database beyond repair.......and since the downloaded change files were also deleted we could not go to the backup....there was nothing to do but rebuild the entire DW from scratch....about 80 hours processing time....

(PS. My scheduler takes care of this........all I have to do in the scheduler is fix the problem and then go into the row that defines the batch and type 'Restart' in the status field and the scheduler will restart the batch at the point of failure.....pretty cool....no more 'human error' in restarts.

Also, since the scheduler handles dependencies 'properly' it will actually complete all work that is not specifically dependent on the failed job before stopping the batch in an abended state.....this would have saved me 12 hours processing time just yesterday. A 'typo' killed a batch stream I submitted overnight and I came in the next morning to find the batch had only run for 2 hours.....we made the one line change and then restarted the batch....but of course the processing had to happen during the day then and we lost about 6 hours testing time for the day.....hhhmmmm.....We were testing the production batch schedule and for that we are using DS basic to control the batch, not my 'home grown scheduler'......hhhhmmmmm....such is life....)

ray.wurlod wrote:My solution to the "manual" approach was cited earlier:
Of course, you need to ensure that any manually-run job as well as any automatically-run job updates DW_BATCH_CONTROL when it starts and when it ends. I normally ensure this with standard before-job and after-job subroutines.