To get the status of Jobs

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply
kumar66
Participant
Posts: 265
Joined: Thu Jul 26, 2007 12:14 am

To get the status of Jobs

Post by kumar66 »

Hi All,

I need to get the input link row count, reject link row count and output link row count of all jobs that ran in a particular batch.

For e.g. if there are two job sequences in a that batch and each sequence contains 5 jobs each , I need to get the above said info.

Please advice how to achieve this.

Thanks & Regards,
Kumar66.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Several ways, all of which have been discussed here ad nauseam. Classic answer is for you to go to Kim Duke's website and download his free ETLStats package. It does that and a crap-ton more... take what you need from it or leverage the whole package. Nowadays that kind of metadata is collected by DataStage directly in the Operations Console, depending on your version. Sorry, don't remember exactly when that was added but I'm sure some kind soul will chime in with the answer.

Otherwise it's all up to you to code something up that leverages the API. That means 'job control' code written in BASIC or UNIX scripts depending on whatever you are most comfortable building and maintaining. For example, from the command line you can use dsjob as noted here. All of the functions listed have an equivalent in BASIC as a 'DSGetxxxx' function.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Amedhyaz
Participant
Posts: 6
Joined: Fri Mar 30, 2012 10:52 am

Re: To get the status of Jobs

Post by Amedhyaz »

To complement the previous post from chulett: Operations Console has become available in release 8.7, launched around November 2011 and further enhanced in version 9.1 with the notable addition of a Workload Management feature.
Amedhyaz
DataStage & Metadata Workbench Developer
Information Sever Administrator
CA4T
Premium Member
Premium Member
Posts: 12
Joined: Tue Jan 15, 2013 11:27 am

Re: To get the status of Jobs

Post by CA4T »

I can't get the link to Kim Duke's website to launch... is that the correct link?
Argus Health
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

It does seem to be inaccessible at the moment. Wait for Kim to get back from vacation. Watch this space.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

I closed my web site down. Dennis is working on moving the files to a new site.
Mamu Kim
kvshrao
Participant
Posts: 6
Joined: Fri Aug 03, 2012 12:25 am
Location: Chennai
Contact:

Re: To get the status of Jobs

Post by kvshrao »

You can use DSJOBReport after subroutine to generate the statistics
Thanks
Ravi
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Just wondering...

This is posted as a parallel job question.
Will it work the same?

How do you set/get the user status for the parallel jobs?
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

There are a couple issues going on here. In older versions of DataStage you need to generate the xml to get the row counts. In newer versions it will generate these files automatically. You may need to set an option.

The dsjob command can generate the xml. You need -report option and XML option to generate an XML file that can be processed easily otherwise the file just shows the row counts. Just run the dsjob command from the UNIX or DOS prompt based on what type of server you are on. You need to run the command on the DataStage server not your client although it will run on your client. When you run a job that runs dsjob as a command then it runs on the server. So to simulate this run it on the server.

You can run this from a routine. Using DSAttach and all the API calls is a lot more work in my opinion than running the command. You need to be pretty good with DataStage BASIC to create these routines. There are tons of examples on this forum so it should not be too hard.

All EtlStats is a bunch of examples of how to get metadata out of DataStage. So it is a bunch of routines and jobs. You only need one of these jobs to get runtime metadata with row counts. One of the parameters for this job is the job you want row counts for. This job name will be fed into the before command which runs dsjob -report XML. It outputs to a file called job_name.xml. This file is the source the DS job. This file is processed 3 times to output to 3 tables.

ETL_JOB_HIST
ETL_ROW_HIST
ETL_PARAM_HIST

You can modify this job to output to any table or file you want. This job is a server job because it was written long time ago before PX jobs were popular. If you import then runtime XML into DataStage and look at it then it becomes obvious that it is fairly straight forward. It has more than one repeating group. The row counts are stored in one group and the parameters in another. Makes sense. The rows are tied to stage names and link names. If you are looking at row counts for PX jobs then it has row counts at the partition level. You need to aggregate these up to the link level or add partition number to the key of ETL_ROW_HIST. You really need to add STAGE_NAME to this key too. Link row counts are repeated because it reports input and output values. The input row count always matches the output row counts for a given link. You never get PX rejected rows unless you have a reject link attached to the target stage. So your job design is critical.

Parameter values are stored in a similar way within the XML. Most people don't care about this but it has saved us a lot of time because I have seen too many developers run a job with default parameters which point to the wrong database. Usually it points to DEV so development is easy. They may also run with the wrong config file which makes the job run slower. These values are valuable.

There are several SQL statements in EtlStats to get last row counts for all job and last parameter counts. So you can quickly see what the last run did.

One other job in EtlStats is very useful. The job name ends in "Driver". This job is used to run the job described above. If you run it with a job name of "ALL" then it will loop through every job in the project and get row counts for all jobs. Very useful when first setting this up. If you give it a job name that is a sequence then it will loop through all jobs in the job property field of "Dependents" and get row counts for all jobs run by that sequence.

It takes several minutes to get row counts for a job. Be careful not to wait on getting row counts after every job. If your job runs in a couple minutes and now you get row counts after then your ETL can take twice as long to finish. Much better to get row counts as a separate process.
Mamu Kim
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

EtlStats is on Ray's web site.
http://www.raywurlod.com/
Mamu Kim
Post Reply