Page 1 of 1

Start and End time of a job

Posted: Wed Dec 01, 2004 11:10 pm
by rajeev_prabhuat
Hi,

I am having 42 jobs and we want keep track of the follwoing details like Job name, no. rows read, no. of rows written, no. or rows rejeted, start time of the job, end time of the job and its status of each job into a sequential file.

Can you help me in doing this, can it be done in one generic job or in invividual job. To get the no. of rows read and written we can use system variables -@inrownum and @outrownum, but i want to capture other details, how can i go about this.

Regards,
Rajeev Prabhu

Posted: Wed Dec 01, 2004 11:25 pm
by vigneshra
You can make use of DSJobReport routine to capture the audit information about a job and it can be called from after-job subroutine.

Regards,
Vignesh.

Posted: Wed Dec 01, 2004 11:48 pm
by rajeev_prabhuat
Hi Vignesh,

Thanks you.

Can we populate this data into a text file and take this data for our use.

Regards,
Rajeev Prabhu
vigneshra wrote:You can make use of DSJobReport routine to capture the audit information about a job and it can be called from after-job subroutine.

Regards,
Vignesh.

Posted: Wed Dec 01, 2004 11:54 pm
by vigneshra
You can do everything right from writing to a sequential file or loading directly into an audit table (in which case, you need licensed ODBC driver for the database). Writing into a sequential file is very simple and you can browse through the DataStage Basic manual to have a very better picture.

Regards,
Vignesh.

Posted: Thu Dec 02, 2004 12:00 am
by rajeev_prabhuat
Hi Vignesh,

Thanks you.

But since i am new to Datastage, i have a doubt how can i write or take data after executing this DS routine in the after job routine, and also what will happen in the job got aborted will this after job routin execute.

Regards,
Rajeev Prabhu
vigneshra wrote:You can do everything right from writing to a sequential file or loading directly into an audit table (in which case, you need licensed ODBC driver for the database). Writing into a sequential file is very simple and you can browse through the DataStage Basic manual to have a very better picture.

Regards,
Vignesh.

Posted: Thu Dec 02, 2004 6:51 am
by kduke
You could download EtlStats.zip from ADN or my tips page. It posts all the row counts to tables in the warehouse but you could easily write these to a text file. You get all the jobs and create table statements. It posts link names and row counts. Very simple to setup. It will also do this for all jobs or one job or all jobs in a sequence.

Posted: Thu Dec 02, 2004 9:31 pm
by rajeev_prabhuat
Hi,

Thanks you Kim.

Regards,
Rajeev Prabhu
kduke wrote:You could download EtlStats.zip from ADN or my tips page. It posts all the row counts to tables in the warehouse but you could easily write these to a text file. You get all the jobs and create table statements. It posts link names and row counts. Very simple to setup. It will also do this for all jobs or one job or all jobs in a sequence.

Posted: Thu Dec 02, 2004 9:56 pm
by rajeev_prabhuat
Hi Kim,

But i have a doubt, how will i make use of this in my jobs, can you give me the details how to go about it.

Regards,
Rajeev Prabhu
kduke wrote:You could download EtlStats.zip from ADN or my tips page. It posts all the row counts to tables in the warehouse but you could easily write these to a text file. You get all the jobs and create table statements. It posts link names and row counts. Very simple to setup. It will also do this for all jobs or one job or all jobs in a sequence.

Posted: Fri Dec 03, 2004 12:08 am
by kduke
We run this at the end of each sequence. It posts start and end times, duration in seconds, rows and rows per second. It also posts parameter values. All of these can be used to measure performance over time. It also allows you to predict when all your sequences will finish. Soon I will have bytes per second and be able to compare this to transaction size and array size. Once I have all of this in tables in my target warehouse database then I can do much better analysis and performance tuning.

I would love to see us as a group do comparisons as to rows per second and bytes per second across platforms. If you have similar servers with same version of the target database then you should see similar performance. If not then why not? You can start to look for other issues. We as a group do very little sharing of this type of information. I think in some way we need to standardize what information we are tracking. What I am giving away is to help start this process of standard measurements for ETL. I am hoping for similar standards in documentation and maybe someday in metadata. I think the way most developers use metadata is terrible.

We in are in a very new industry. Lots of people talk about best practices. Do they share them with all? Are they really "best"? How do you know? If you share these ideas in a public forum like DSX then others can improve on them or mold them. Maybe we can all agree on some "best" practices. Maybe this is just a waste of time. I don't know. Maybe we will never agree on anything. I like to hope for the best. I hope others will share code and jobs and ideas that will change the way I think about something. I sure don't have all the answers maybe Ray does.

I think there is a difference between technical best practice and a textbook software lifecycle best practice. The is a difference between theory and practical use of a concept. Lots of people can quote the Gods of Data Warehousing but can the truly design and build something solid. Is it maintainable after they are gone? Did it take them 3 times longer than it should? Did they use their resources effectively? Did they put so much pressure on their best talent that they quit or burned out? From my experience more and more is expected from those that deliver and less and less from those that do not. Why is it that some people can deliver more work than others? Can these techniques be passed on so all develop at the same rate? Can all of us carry the details in our head that Ray carries around? Do we want to? At what cost? How high a price did Ray pay to learn all that? My brain does not work like his. I could not remember all that if I tried. So I build generic solutions which would work anywhere like EtlStats and GenHtml then I can forget about it. My row counts will be there waiting when I need to improve performance. My documentation is already generated when somebody asks for it. No worries mate.

All of this should be practical. If you see no need for this then don't do it. I think all my customers benefit from more complete solutions and methodologies. Better documentation is a part of a better solution. The amount you get paid should be based on the amount of work produced and the quality of the work.

Posted: Fri Dec 03, 2004 12:57 am
by rajeev_prabhuat
Hi Kim,

I got all the jobs that you have given in the link, but can you explain how i can start working on this. And how can i go forward in comiling this.

Regards,
Rajeev Prabhu
kduke wrote:You could download EtlStats.zip from ADN or my tips page. It posts all the row counts to tables in the warehouse but you could easily write these to a text file. You get all the jobs and create table statements. It posts link names and row counts. Very simple to setup. It will also do this for all jobs or one job or all jobs in a sequence.

Posted: Fri Dec 03, 2004 7:43 am
by kduke
You will need to create the tables. The scripts are there. You need to run BuildJobStatusHash. This builds a hahs file used by DSJobReportDb. The controlling job is DSJobReportDbDriver. If you put All in the Category parameter then it will load row counts for all jobs. To get row counts at the end of your sequence then you need to run this at the end of a sequence and put the sequence job name in the Category parameter.

If you run this twice on the same job then it will not cause a problem. All the tables are have start time as one of the keys so you will just overwrite the same record. I have reports which get the max start time and prints rows per second on the last run. I run these in Access so I can cut and paste them into emails and spreadsheets. I plan on emailing these directly soon.

If you have the jobs LoadEtlJob2, LoadEtlJob and LoadEtlJobLink then run these in this order before running reports. All these tables do is allow you to link job to a category in the ETL_JOB table. I also plan on taking the job link name and looking up the table name but for now all you can do is lookup the link type. This will tell you it is a hash file or Oracle table or whatever.

Posted: Tue May 17, 2005 6:50 am
by Madhav_M
Hi
How a variable captured in routine can be outputed to a table?

Thanks
Madi

Posted: Tue May 17, 2005 8:08 am
by kduke
You can call this routine in a job's transform.