Page 1 of 1

To get the status of Jobs

Posted: Thu Dec 26, 2013 3:30 am
by kumar66
Hi All,

I need to get the input link row count, reject link row count and output link row count of all jobs that ran in a particular batch.

For e.g. if there are two job sequences in a that batch and each sequence contains 5 jobs each , I need to get the above said info.

Please advice how to achieve this.

Thanks & Regards,
Kumar66.

Posted: Thu Dec 26, 2013 8:29 am
by chulett
Several ways, all of which have been discussed here ad nauseam. Classic answer is for you to go to Kim Duke's website and download his free ETLStats package. It does that and a crap-ton more... take what you need from it or leverage the whole package. Nowadays that kind of metadata is collected by DataStage directly in the Operations Console, depending on your version. Sorry, don't remember exactly when that was added but I'm sure some kind soul will chime in with the answer.

Otherwise it's all up to you to code something up that leverages the API. That means 'job control' code written in BASIC or UNIX scripts depending on whatever you are most comfortable building and maintaining. For example, from the command line you can use dsjob as noted here. All of the functions listed have an equivalent in BASIC as a 'DSGetxxxx' function.

Re: To get the status of Jobs

Posted: Fri Dec 27, 2013 12:12 pm
by Amedhyaz
To complement the previous post from chulett: Operations Console has become available in release 8.7, launched around November 2011 and further enhanced in version 9.1 with the notable addition of a Workload Management feature.

Re: To get the status of Jobs

Posted: Wed Jan 01, 2014 12:33 pm
by CA4T
I can't get the link to Kim Duke's website to launch... is that the correct link?

Posted: Wed Jan 01, 2014 1:20 pm
by ray.wurlod
It does seem to be inaccessible at the moment. Wait for Kim to get back from vacation. Watch this space.

Posted: Fri Jan 03, 2014 12:08 am
by kduke
I closed my web site down. Dennis is working on moving the files to a new site.

Re: To get the status of Jobs

Posted: Fri Jan 03, 2014 8:05 am
by kvshrao
You can use DSJOBReport after subroutine to generate the statistics

Posted: Tue Jan 07, 2014 8:08 am
by roy
Just wondering...

This is posted as a parallel job question.
Will it work the same?

How do you set/get the user status for the parallel jobs?

Posted: Tue Jan 07, 2014 1:58 pm
by kduke
There are a couple issues going on here. In older versions of DataStage you need to generate the xml to get the row counts. In newer versions it will generate these files automatically. You may need to set an option.

The dsjob command can generate the xml. You need -report option and XML option to generate an XML file that can be processed easily otherwise the file just shows the row counts. Just run the dsjob command from the UNIX or DOS prompt based on what type of server you are on. You need to run the command on the DataStage server not your client although it will run on your client. When you run a job that runs dsjob as a command then it runs on the server. So to simulate this run it on the server.

You can run this from a routine. Using DSAttach and all the API calls is a lot more work in my opinion than running the command. You need to be pretty good with DataStage BASIC to create these routines. There are tons of examples on this forum so it should not be too hard.

All EtlStats is a bunch of examples of how to get metadata out of DataStage. So it is a bunch of routines and jobs. You only need one of these jobs to get runtime metadata with row counts. One of the parameters for this job is the job you want row counts for. This job name will be fed into the before command which runs dsjob -report XML. It outputs to a file called job_name.xml. This file is the source the DS job. This file is processed 3 times to output to 3 tables.

ETL_JOB_HIST
ETL_ROW_HIST
ETL_PARAM_HIST

You can modify this job to output to any table or file you want. This job is a server job because it was written long time ago before PX jobs were popular. If you import then runtime XML into DataStage and look at it then it becomes obvious that it is fairly straight forward. It has more than one repeating group. The row counts are stored in one group and the parameters in another. Makes sense. The rows are tied to stage names and link names. If you are looking at row counts for PX jobs then it has row counts at the partition level. You need to aggregate these up to the link level or add partition number to the key of ETL_ROW_HIST. You really need to add STAGE_NAME to this key too. Link row counts are repeated because it reports input and output values. The input row count always matches the output row counts for a given link. You never get PX rejected rows unless you have a reject link attached to the target stage. So your job design is critical.

Parameter values are stored in a similar way within the XML. Most people don't care about this but it has saved us a lot of time because I have seen too many developers run a job with default parameters which point to the wrong database. Usually it points to DEV so development is easy. They may also run with the wrong config file which makes the job run slower. These values are valuable.

There are several SQL statements in EtlStats to get last row counts for all job and last parameter counts. So you can quickly see what the last run did.

One other job in EtlStats is very useful. The job name ends in "Driver". This job is used to run the job described above. If you run it with a job name of "ALL" then it will loop through every job in the project and get row counts for all jobs. Very useful when first setting this up. If you give it a job name that is a sequence then it will loop through all jobs in the job property field of "Dependents" and get row counts for all jobs run by that sequence.

It takes several minutes to get row counts for a job. Be careful not to wait on getting row counts after every job. If your job runs in a couple minutes and now you get row counts after then your ETL can take twice as long to finish. Much better to get row counts as a separate process.

Posted: Wed Mar 19, 2014 4:49 pm
by kduke
EtlStats is on Ray's web site.
http://www.raywurlod.com/