get all jobs from a workflow.

jasper · Post by **jasper** » Fri Jul 15, 2005 1:35 am

Hi,

For several reasons I would like to make a script that can get all jobnames that are run in a workflow.
If we have this list we can do things like :
-dump relevant loggings into one logfile
-compile all these jobs
-check if one of the jobs is running now, before starting the flow(maybe only for non-multiple instance)
-...

I suppose this will need to be done trough a query on the underlying DS-db, anyone has an idea about which tables to use?

ArndW · Post by **ArndW** » Fri Jul 15, 2005 1:45 am

Good morning Jasper,

most of what you would like to do is available in different places within DataStage.

Since yo mention "script" I will assume you want to get all of this from somewhere outside of DataStage, so your main source will be variations on the unix "dsjob" command to get the log entries and status of a job.

The external compile command is one thing I would not recommend doing - you can reset jobs but the compilation should be done by the developer when the job is created.

You really do not need to query the underlying UniVerse database tables for this; and since these table structures are not documented they are subject to change, so whatever you may write today might not function tomorrow.

jasper · Post by **jasper** » Fri Jul 15, 2005 2:06 am

ArndW,

indeed for getting the logentries I will use dsjob, compile is something I will only do after failures(we have a problem with basic-transformers that keep hanging, solution is unlock all + compile).

Main question is however : How do i find all jobs within a workflow , I don't see this in dsjob-documentation?

Some explanations about our environment:
-scheduling is done trough autosys.
-autosys calls a unixscript for each workflow
-this unixscript just sets some environment parameters and then runs the workflow trough dsjob
-unixscript now has no idea about which ds-jobs are in the workflow

What I would like to add:
-before running the workflow check if all jobs in the workflow are ok to run(not running, compiled,... ) reset can offcource be done in the flow
-after run: get all warnings/fatal from all the logs and dump them to a unix-log
- if run failed : check if there are still processes hanging for Basic-transforms, if so kill them , unlock all and compile all.

I have a good idea how to do all these things(mainly thanks to this forum) but the start point offcourse is to get all jobs in the flow(I don't want to change the script everytime a job is added/deleted from the flow.

ArndW · Post by **ArndW** » Fri Jul 15, 2005 2:21 am

jasper,

in my current project we have done something quite similar. Essentially we have a table (in Oracle, copied to a Hash file) which contains the Sequence and Job structure (unique primary key is Name & Parent) and so we can compute which jobs are called by which sequences - this is necessary with over 1000 jobs.

This table is generated and updated by the programmers, I wrote some code [for the customer, so I can't publish it] that goes through the internal tables in sequences to make sure that the called programs are in the tables. I think in your case you could create your copy of the relationships using a full dsexport and then writing a small program/job to parse this file, get all sequencers and which jobs they call.

Can you not use the "reset" option instead of re-compiling? If you add this switch to your seqeuncer calls you won't have to worry about a called job's state when executing it; and the sequence can also wait for it to finish running if it has been called elsewhere.

jasper · Post by **jasper** » Fri Jul 15, 2005 2:43 am

thanks,I'll try from the export.

resetting is not enough, what's happening is that processes still hang, so a reset is not enough(next run will give unable to run on basic transformer stage, while status looks ok).

Have a case open for this with datastage, who allready admitted it's a bug and are working on it.

kduke · Post by **kduke** » Fri Jul 15, 2005 10:10 am

All of this is available in EtlStats and GenHtml. Both are free. EtlStats can get row counts and run times for a sequence and populate several tables so you can run reports to see if a job is slowing over time. A lot of this types of reports are included. The reports can be automated so they will email them to you at the end of each sequence. I email the row counts for each job at the end of my sequences. I also email the logs of jobs that finished with something other than OK status.

GenHtml will document any job including all jobs in a sequence. It will create an index for the sequence with a link to all the web pages for each job. So you get a web page named Seq_Index.html where Seq is your sequence name. If you do all jobs then you get All_Index.html.

Both of these are free by downloading them from my tips page. You get source on all but 2 routines so you can figure out the workflow of a sequence from my code.

DwNav is a product I sell real inexpensive. Next version is going to cost a lot more. It combines both of these. It lets you browse several levels deep into workflow or job dependencies. So if a sequence calls a sequence which calls a sequence then you can drill down into this relationship. It also will generate html documentation for each job or sequence and combine row counts and run times into the documentation. You have to extract the row counts once in while to be able to browse them.