Hi All,
I need to get the input link row count, reject link row count and output link row count of all jobs that ran in a particular batch.
For e.g. if there are two job sequences in a that batch and each sequence contains 5 jobs each , I need to get the above said info.
Please advice how to achieve this.
Thanks & Regards,
Kumar66.
To get the status of Jobs
Moderators: chulett, rschirm, roy
Several ways, all of which have been discussed here ad nauseam. Classic answer is for you to go to Kim Duke's website and download his free ETLStats package. It does that and a crap-ton more... take what you need from it or leverage the whole package. Nowadays that kind of metadata is collected by DataStage directly in the Operations Console, depending on your version. Sorry, don't remember exactly when that was added but I'm sure some kind soul will chime in with the answer.
Otherwise it's all up to you to code something up that leverages the API. That means 'job control' code written in BASIC or UNIX scripts depending on whatever you are most comfortable building and maintaining. For example, from the command line you can use dsjob as noted here. All of the functions listed have an equivalent in BASIC as a 'DSGetxxxx' function.
Otherwise it's all up to you to code something up that leverages the API. That means 'job control' code written in BASIC or UNIX scripts depending on whatever you are most comfortable building and maintaining. For example, from the command line you can use dsjob as noted here. All of the functions listed have an equivalent in BASIC as a 'DSGetxxxx' function.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
Re: To get the status of Jobs
To complement the previous post from chulett: Operations Console has become available in release 8.7, launched around November 2011 and further enhanced in version 9.1 with the notable addition of a Workload Management feature.
Amedhyaz
DataStage & Metadata Workbench Developer
Information Sever Administrator
DataStage & Metadata Workbench Developer
Information Sever Administrator
Re: To get the status of Jobs
I can't get the link to Kim Duke's website to launch... is that the correct link?
Argus Health
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Re: To get the status of Jobs
You can use DSJOBReport after subroutine to generate the statistics
Thanks
Ravi
Ravi
Just wondering...
This is posted as a parallel job question.
Will it work the same?
How do you set/get the user status for the parallel jobs?
This is posted as a parallel job question.
Will it work the same?
How do you set/get the user status for the parallel jobs?
Roy R.
Time is money but when you don't have money time is all you can afford.
Search before posting:)
Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Time is money but when you don't have money time is all you can afford.
Search before posting:)
Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
There are a couple issues going on here. In older versions of DataStage you need to generate the xml to get the row counts. In newer versions it will generate these files automatically. You may need to set an option.
The dsjob command can generate the xml. You need -report option and XML option to generate an XML file that can be processed easily otherwise the file just shows the row counts. Just run the dsjob command from the UNIX or DOS prompt based on what type of server you are on. You need to run the command on the DataStage server not your client although it will run on your client. When you run a job that runs dsjob as a command then it runs on the server. So to simulate this run it on the server.
You can run this from a routine. Using DSAttach and all the API calls is a lot more work in my opinion than running the command. You need to be pretty good with DataStage BASIC to create these routines. There are tons of examples on this forum so it should not be too hard.
All EtlStats is a bunch of examples of how to get metadata out of DataStage. So it is a bunch of routines and jobs. You only need one of these jobs to get runtime metadata with row counts. One of the parameters for this job is the job you want row counts for. This job name will be fed into the before command which runs dsjob -report XML. It outputs to a file called job_name.xml. This file is the source the DS job. This file is processed 3 times to output to 3 tables.
ETL_JOB_HIST
ETL_ROW_HIST
ETL_PARAM_HIST
You can modify this job to output to any table or file you want. This job is a server job because it was written long time ago before PX jobs were popular. If you import then runtime XML into DataStage and look at it then it becomes obvious that it is fairly straight forward. It has more than one repeating group. The row counts are stored in one group and the parameters in another. Makes sense. The rows are tied to stage names and link names. If you are looking at row counts for PX jobs then it has row counts at the partition level. You need to aggregate these up to the link level or add partition number to the key of ETL_ROW_HIST. You really need to add STAGE_NAME to this key too. Link row counts are repeated because it reports input and output values. The input row count always matches the output row counts for a given link. You never get PX rejected rows unless you have a reject link attached to the target stage. So your job design is critical.
Parameter values are stored in a similar way within the XML. Most people don't care about this but it has saved us a lot of time because I have seen too many developers run a job with default parameters which point to the wrong database. Usually it points to DEV so development is easy. They may also run with the wrong config file which makes the job run slower. These values are valuable.
There are several SQL statements in EtlStats to get last row counts for all job and last parameter counts. So you can quickly see what the last run did.
One other job in EtlStats is very useful. The job name ends in "Driver". This job is used to run the job described above. If you run it with a job name of "ALL" then it will loop through every job in the project and get row counts for all jobs. Very useful when first setting this up. If you give it a job name that is a sequence then it will loop through all jobs in the job property field of "Dependents" and get row counts for all jobs run by that sequence.
It takes several minutes to get row counts for a job. Be careful not to wait on getting row counts after every job. If your job runs in a couple minutes and now you get row counts after then your ETL can take twice as long to finish. Much better to get row counts as a separate process.
The dsjob command can generate the xml. You need -report option and XML option to generate an XML file that can be processed easily otherwise the file just shows the row counts. Just run the dsjob command from the UNIX or DOS prompt based on what type of server you are on. You need to run the command on the DataStage server not your client although it will run on your client. When you run a job that runs dsjob as a command then it runs on the server. So to simulate this run it on the server.
You can run this from a routine. Using DSAttach and all the API calls is a lot more work in my opinion than running the command. You need to be pretty good with DataStage BASIC to create these routines. There are tons of examples on this forum so it should not be too hard.
All EtlStats is a bunch of examples of how to get metadata out of DataStage. So it is a bunch of routines and jobs. You only need one of these jobs to get runtime metadata with row counts. One of the parameters for this job is the job you want row counts for. This job name will be fed into the before command which runs dsjob -report XML. It outputs to a file called job_name.xml. This file is the source the DS job. This file is processed 3 times to output to 3 tables.
ETL_JOB_HIST
ETL_ROW_HIST
ETL_PARAM_HIST
You can modify this job to output to any table or file you want. This job is a server job because it was written long time ago before PX jobs were popular. If you import then runtime XML into DataStage and look at it then it becomes obvious that it is fairly straight forward. It has more than one repeating group. The row counts are stored in one group and the parameters in another. Makes sense. The rows are tied to stage names and link names. If you are looking at row counts for PX jobs then it has row counts at the partition level. You need to aggregate these up to the link level or add partition number to the key of ETL_ROW_HIST. You really need to add STAGE_NAME to this key too. Link row counts are repeated because it reports input and output values. The input row count always matches the output row counts for a given link. You never get PX rejected rows unless you have a reject link attached to the target stage. So your job design is critical.
Parameter values are stored in a similar way within the XML. Most people don't care about this but it has saved us a lot of time because I have seen too many developers run a job with default parameters which point to the wrong database. Usually it points to DEV so development is easy. They may also run with the wrong config file which makes the job run slower. These values are valuable.
There are several SQL statements in EtlStats to get last row counts for all job and last parameter counts. So you can quickly see what the last run did.
One other job in EtlStats is very useful. The job name ends in "Driver". This job is used to run the job described above. If you run it with a job name of "ALL" then it will loop through every job in the project and get row counts for all jobs. Very useful when first setting this up. If you give it a job name that is a sequence then it will loop through all jobs in the job property field of "Dependents" and get row counts for all jobs run by that sequence.
It takes several minutes to get row counts for a job. Be careful not to wait on getting row counts after every job. If your job runs in a couple minutes and now you get row counts after then your ETL can take twice as long to finish. Much better to get row counts as a separate process.
Mamu Kim