Capturing Job Information? Pls Help asap!

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
vigneshra
Participant
Posts: 86
Joined: Wed Jun 09, 2004 6:07 am
Location: Chennai

Capturing Job Information? Pls Help asap!

Post by vigneshra »

Hi,

In our project, we need to develop an ETL auditing process as a part of ensuring Data Quality. For that we need to develop an automated process (either an ETL job or routine) that captures all information about the jobs under our project. The information are to be stored separately in different database tables as below (immediately after the development phase is over).

ETL Job
Job Name
Job Description

Stage
Stage Name
Stage Type (Aggregator, transformer, etc.)
Stage Description

Stage Link Relationship
Link Type (input, output)
Reject Link Flag

Link
Link Name

How to achieve this? Anybody done it in your project? Please share your thoughts!
Vignesh.

"A conclusion is simply the place where you got tired of thinking."
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Hi,
You can try checking the DSMakeJobReport Routine, In case it is not yet there in 7.0.2 go over the DSGetJobInfo, DSGetLinkInfo and such for the relevant options to get the info you need.

Depending on the info you need you might also need to use the DSGetLogEntry/Summery and such to extract things regarding the last run time of the job in question.

on top of this try searching for more info in this forum (I don't have 7.0.2 avail to check your exact options :().

IHTH,
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Vineshgra,

this is not difficult to do using the user interface calls.

He main workhorse routine to get this information is DSGetJobInfo, plus using the other DSGet... routines to find your detail information. I have done this in a basic function to populate the client's tables (the tables are of different setup and format, but the content is essentially the same).
vigneshra
Participant
Posts: 86
Joined: Wed Jun 09, 2004 6:07 am
Location: Chennai

Post by vigneshra »

Roy and Arnd,

Thanks for your quick response! To be true, I never tried these DataStage in-built basic routines. I just want to know if these routines can give the details like type of the stage (Transformer, Sorter, Aggregator, etc) and stage-link relationships (like In_Trns is the input link and Out_Trns is the output link to the transformer Trns_record). If it is very possible, then I can go ahead and implement the idea. Also this is the first time, I will be working with those routines. So I just want to know if there is anything specific to be considered before using those routines.

Thanks!
Vignesh.

"A conclusion is simply the place where you got tired of thinking."
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Vinshgra,

the routines are quite easy to use and the documentation of the DS... routines in the BASIC pdf help file is quite good. All of the calls you need are DSAttachJob, DSDetachjob, DSGetJobInfo, DSGetProjectInfo, DSGetStageInfo, DSGetLinkInfo, and DSGetParamInfo.
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

If you would upgrade then you could use EtlStats. This is a set of jobs and routines to get most of this. It updates a set of tables in the target like ETL_JOB, ETL_ROW_HIST and ETL_LINK. Everything you need except the 2 items below is already built for you.
Stage Link Relationship
Link Type (input, output)
Reject Link Flag
These should be easy to add. I am not sure of all these can be loaded from the routine calls supplied by Ascential. If I get some time this week. I will help isolate the last 2 fields you need. There is a post maybe from you on trying to get EtlStats to import into 7.0. Do a search. There are lots of posts about EtlStats. Nobody else seems to have a solution which they are willing to share with DSX. Shame really.

If you download DwNav from my web site it will create a series of hash files. The auth code was posted yesterday on here. To get row counts execute a TCL command

Code: Select all

DWN_RT_ROW_COUNT.uvb JobName
This will populate hash files:

Code: Select all

DWN_RT_JOB_STAT_HIST
DWN_RT_JOB_STAT
DWN_RT_ROW_COUNT_HIST
DWN_RT_ROW_COUNT
DWN_RT_PARAM_HIST
DWN_RT_PARAM
These all have dictionary items. Import their metadata and load them to a table of your choice. This code will work in versions 5,6 and 7 of DataStage. It stores the Link Type above as well. I have posted code from these if you still want to build it yourself. Do a search. The hash files with HIST in the name have the job run time as part of the key. This allows you to compare one run to the previous run. This is much easier in your target database so copy the data there. If you go this route then let me know and I will show you to get all of what you need.

This is valuable information keep trying to you get it all. Post your solution and what you are still missing and we will help finish it.
Mamu Kim
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

All of the information Vignesh seeks can be found in the DataStage Repository - and therefore can be reported on using the Reporting Tool (Documentation Assistant) or MetaStage.

In current versions, indeed, all the information is in a single table, but this is one of those tables for which complete column definitions have not been published.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

You are correct. There is no mention of rows counts or run times. Run Reporting Assistant. DSLinks should have most of this.
Mamu Kim
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

I don't think parallel jobs can be exported to the Reporting Assistant. They certainly do not turn up on the "Jobs" tab when I run it. MetaStage is a better option and by the next release we will probably see the Reporting Assistant disappearing altogether to be replaced by better metadata reporting straight out of DataStage.
Transmogorifier
Participant
Posts: 9
Joined: Sat Feb 12, 2005 5:24 pm

Using shared container....?

Post by Transmogorifier »

An easy way to achieve this would be to create a shared container having a triggered input from the jobs you are trying to audit. I'm afraid this would be just patchwork, but it still works. Include a few parameters which you can tune to the stages and links that are in the job and then use the BASIC job functions in a transformer stage in the shared container to capture the job parameters like the job name, transform stage, linkrowcount, etc. Also include a flatfile or a database stage to write the data into the tables in the database inside the shared container. But this would entail adding this shared container stage in all the existing job as well as in all the jobs developed in the future.

Hope this helps.
Cheers. :)
Post Reply