Page 1 of 1

Capturing Job Information? Pls Help asap!

Posted: Sun Apr 10, 2005 2:57 am
by vigneshra
Hi,

In our project, we need to develop an ETL auditing process as a part of ensuring Data Quality. For that we need to develop an automated process (either an ETL job or routine) that captures all information about the jobs under our project. The information are to be stored separately in different database tables as below (immediately after the development phase is over).

ETL Job
Job Name
Job Description

Stage
Stage Name
Stage Type (Aggregator, transformer, etc.)
Stage Description

Stage Link Relationship
Link Type (input, output)
Reject Link Flag

Link
Link Name

How to achieve this? Anybody done it in your project? Please share your thoughts!

Posted: Sun Apr 10, 2005 4:59 am
by roy
Hi,
You can try checking the DSMakeJobReport Routine, In case it is not yet there in 7.0.2 go over the DSGetJobInfo, DSGetLinkInfo and such for the relevant options to get the info you need.

Depending on the info you need you might also need to use the DSGetLogEntry/Summery and such to extract things regarding the last run time of the job in question.

on top of this try searching for more info in this forum (I don't have 7.0.2 avail to check your exact options :().

IHTH,

Posted: Sun Apr 10, 2005 5:27 am
by ArndW
Vineshgra,

this is not difficult to do using the user interface calls.

He main workhorse routine to get this information is DSGetJobInfo, plus using the other DSGet... routines to find your detail information. I have done this in a basic function to populate the client's tables (the tables are of different setup and format, but the content is essentially the same).

Posted: Sun Apr 10, 2005 5:42 am
by vigneshra
Roy and Arnd,

Thanks for your quick response! To be true, I never tried these DataStage in-built basic routines. I just want to know if these routines can give the details like type of the stage (Transformer, Sorter, Aggregator, etc) and stage-link relationships (like In_Trns is the input link and Out_Trns is the output link to the transformer Trns_record). If it is very possible, then I can go ahead and implement the idea. Also this is the first time, I will be working with those routines. So I just want to know if there is anything specific to be considered before using those routines.

Thanks!

Posted: Sun Apr 10, 2005 5:57 am
by ArndW
Vinshgra,

the routines are quite easy to use and the documentation of the DS... routines in the BASIC pdf help file is quite good. All of the calls you need are DSAttachJob, DSDetachjob, DSGetJobInfo, DSGetProjectInfo, DSGetStageInfo, DSGetLinkInfo, and DSGetParamInfo.

Posted: Sun Apr 10, 2005 7:14 am
by kduke
If you would upgrade then you could use EtlStats. This is a set of jobs and routines to get most of this. It updates a set of tables in the target like ETL_JOB, ETL_ROW_HIST and ETL_LINK. Everything you need except the 2 items below is already built for you.
Stage Link Relationship
Link Type (input, output)
Reject Link Flag
These should be easy to add. I am not sure of all these can be loaded from the routine calls supplied by Ascential. If I get some time this week. I will help isolate the last 2 fields you need. There is a post maybe from you on trying to get EtlStats to import into 7.0. Do a search. There are lots of posts about EtlStats. Nobody else seems to have a solution which they are willing to share with DSX. Shame really.

If you download DwNav from my web site it will create a series of hash files. The auth code was posted yesterday on here. To get row counts execute a TCL command

Code: Select all

DWN_RT_ROW_COUNT.uvb JobName
This will populate hash files:

Code: Select all

DWN_RT_JOB_STAT_HIST
DWN_RT_JOB_STAT
DWN_RT_ROW_COUNT_HIST
DWN_RT_ROW_COUNT
DWN_RT_PARAM_HIST
DWN_RT_PARAM
These all have dictionary items. Import their metadata and load them to a table of your choice. This code will work in versions 5,6 and 7 of DataStage. It stores the Link Type above as well. I have posted code from these if you still want to build it yourself. Do a search. The hash files with HIST in the name have the job run time as part of the key. This allows you to compare one run to the previous run. This is much easier in your target database so copy the data there. If you go this route then let me know and I will show you to get all of what you need.

This is valuable information keep trying to you get it all. Post your solution and what you are still missing and we will help finish it.

Posted: Sun Apr 10, 2005 2:52 pm
by ray.wurlod
All of the information Vignesh seeks can be found in the DataStage Repository - and therefore can be reported on using the Reporting Tool (Documentation Assistant) or MetaStage.

In current versions, indeed, all the information is in a single table, but this is one of those tables for which complete column definitions have not been published.

Posted: Sun Apr 10, 2005 5:25 pm
by kduke
You are correct. There is no mention of rows counts or run times. Run Reporting Assistant. DSLinks should have most of this.

Posted: Sun Apr 10, 2005 7:46 pm
by vmcburney
I don't think parallel jobs can be exported to the Reporting Assistant. They certainly do not turn up on the "Jobs" tab when I run it. MetaStage is a better option and by the next release we will probably see the Reporting Assistant disappearing altogether to be replaced by better metadata reporting straight out of DataStage.

Using shared container....?

Posted: Mon Apr 11, 2005 8:43 am
by Transmogorifier
An easy way to achieve this would be to create a shared container having a triggered input from the jobs you are trying to audit. I'm afraid this would be just patchwork, but it still works. Include a few parameters which you can tune to the stages and links that are in the job and then use the BASIC job functions in a transformer stage in the shared container to capture the job parameters like the job name, transform stage, linkrowcount, etc. Also include a flatfile or a database stage to write the data into the tables in the database inside the shared container. But this would entail adding this shared container stage in all the existing job as well as in all the jobs developed in the future.

Hope this helps.
Cheers. :)