Start and End time of a job

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
rajeev_prabhuat
Participant
Posts: 136
Joined: Wed Sep 29, 2004 5:56 am
Location: Chennai
Contact:

Start and End time of a job

Post by rajeev_prabhuat »

Hi,

I am having 42 jobs and we want keep track of the follwoing details like Job name, no. rows read, no. of rows written, no. or rows rejeted, start time of the job, end time of the job and its status of each job into a sequential file.

Can you help me in doing this, can it be done in one generic job or in invividual job. To get the no. of rows read and written we can use system variables -@inrownum and @outrownum, but i want to capture other details, how can i go about this.

Regards,
Rajeev Prabhu
vigneshra
Participant
Posts: 86
Joined: Wed Jun 09, 2004 6:07 am
Location: Chennai

Post by vigneshra »

You can make use of DSJobReport routine to capture the audit information about a job and it can be called from after-job subroutine.

Regards,
Vignesh.
rajeev_prabhuat
Participant
Posts: 136
Joined: Wed Sep 29, 2004 5:56 am
Location: Chennai
Contact:

Post by rajeev_prabhuat »

Hi Vignesh,

Thanks you.

Can we populate this data into a text file and take this data for our use.

Regards,
Rajeev Prabhu
vigneshra wrote:You can make use of DSJobReport routine to capture the audit information about a job and it can be called from after-job subroutine.

Regards,
Vignesh.
vigneshra
Participant
Posts: 86
Joined: Wed Jun 09, 2004 6:07 am
Location: Chennai

Post by vigneshra »

You can do everything right from writing to a sequential file or loading directly into an audit table (in which case, you need licensed ODBC driver for the database). Writing into a sequential file is very simple and you can browse through the DataStage Basic manual to have a very better picture.

Regards,
Vignesh.
rajeev_prabhuat
Participant
Posts: 136
Joined: Wed Sep 29, 2004 5:56 am
Location: Chennai
Contact:

Post by rajeev_prabhuat »

Hi Vignesh,

Thanks you.

But since i am new to Datastage, i have a doubt how can i write or take data after executing this DS routine in the after job routine, and also what will happen in the job got aborted will this after job routin execute.

Regards,
Rajeev Prabhu
vigneshra wrote:You can do everything right from writing to a sequential file or loading directly into an audit table (in which case, you need licensed ODBC driver for the database). Writing into a sequential file is very simple and you can browse through the DataStage Basic manual to have a very better picture.

Regards,
Vignesh.
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

You could download EtlStats.zip from ADN or my tips page. It posts all the row counts to tables in the warehouse but you could easily write these to a text file. You get all the jobs and create table statements. It posts link names and row counts. Very simple to setup. It will also do this for all jobs or one job or all jobs in a sequence.
Mamu Kim
rajeev_prabhuat
Participant
Posts: 136
Joined: Wed Sep 29, 2004 5:56 am
Location: Chennai
Contact:

Post by rajeev_prabhuat »

Hi,

Thanks you Kim.

Regards,
Rajeev Prabhu
kduke wrote:You could download EtlStats.zip from ADN or my tips page. It posts all the row counts to tables in the warehouse but you could easily write these to a text file. You get all the jobs and create table statements. It posts link names and row counts. Very simple to setup. It will also do this for all jobs or one job or all jobs in a sequence.
rajeev_prabhuat
Participant
Posts: 136
Joined: Wed Sep 29, 2004 5:56 am
Location: Chennai
Contact:

Post by rajeev_prabhuat »

Hi Kim,

But i have a doubt, how will i make use of this in my jobs, can you give me the details how to go about it.

Regards,
Rajeev Prabhu
kduke wrote:You could download EtlStats.zip from ADN or my tips page. It posts all the row counts to tables in the warehouse but you could easily write these to a text file. You get all the jobs and create table statements. It posts link names and row counts. Very simple to setup. It will also do this for all jobs or one job or all jobs in a sequence.
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

We run this at the end of each sequence. It posts start and end times, duration in seconds, rows and rows per second. It also posts parameter values. All of these can be used to measure performance over time. It also allows you to predict when all your sequences will finish. Soon I will have bytes per second and be able to compare this to transaction size and array size. Once I have all of this in tables in my target warehouse database then I can do much better analysis and performance tuning.

I would love to see us as a group do comparisons as to rows per second and bytes per second across platforms. If you have similar servers with same version of the target database then you should see similar performance. If not then why not? You can start to look for other issues. We as a group do very little sharing of this type of information. I think in some way we need to standardize what information we are tracking. What I am giving away is to help start this process of standard measurements for ETL. I am hoping for similar standards in documentation and maybe someday in metadata. I think the way most developers use metadata is terrible.

We in are in a very new industry. Lots of people talk about best practices. Do they share them with all? Are they really "best"? How do you know? If you share these ideas in a public forum like DSX then others can improve on them or mold them. Maybe we can all agree on some "best" practices. Maybe this is just a waste of time. I don't know. Maybe we will never agree on anything. I like to hope for the best. I hope others will share code and jobs and ideas that will change the way I think about something. I sure don't have all the answers maybe Ray does.

I think there is a difference between technical best practice and a textbook software lifecycle best practice. The is a difference between theory and practical use of a concept. Lots of people can quote the Gods of Data Warehousing but can the truly design and build something solid. Is it maintainable after they are gone? Did it take them 3 times longer than it should? Did they use their resources effectively? Did they put so much pressure on their best talent that they quit or burned out? From my experience more and more is expected from those that deliver and less and less from those that do not. Why is it that some people can deliver more work than others? Can these techniques be passed on so all develop at the same rate? Can all of us carry the details in our head that Ray carries around? Do we want to? At what cost? How high a price did Ray pay to learn all that? My brain does not work like his. I could not remember all that if I tried. So I build generic solutions which would work anywhere like EtlStats and GenHtml then I can forget about it. My row counts will be there waiting when I need to improve performance. My documentation is already generated when somebody asks for it. No worries mate.

All of this should be practical. If you see no need for this then don't do it. I think all my customers benefit from more complete solutions and methodologies. Better documentation is a part of a better solution. The amount you get paid should be based on the amount of work produced and the quality of the work.
Mamu Kim
rajeev_prabhuat
Participant
Posts: 136
Joined: Wed Sep 29, 2004 5:56 am
Location: Chennai
Contact:

Post by rajeev_prabhuat »

Hi Kim,

I got all the jobs that you have given in the link, but can you explain how i can start working on this. And how can i go forward in comiling this.

Regards,
Rajeev Prabhu
kduke wrote:You could download EtlStats.zip from ADN or my tips page. It posts all the row counts to tables in the warehouse but you could easily write these to a text file. You get all the jobs and create table statements. It posts link names and row counts. Very simple to setup. It will also do this for all jobs or one job or all jobs in a sequence.
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

You will need to create the tables. The scripts are there. You need to run BuildJobStatusHash. This builds a hahs file used by DSJobReportDb. The controlling job is DSJobReportDbDriver. If you put All in the Category parameter then it will load row counts for all jobs. To get row counts at the end of your sequence then you need to run this at the end of a sequence and put the sequence job name in the Category parameter.

If you run this twice on the same job then it will not cause a problem. All the tables are have start time as one of the keys so you will just overwrite the same record. I have reports which get the max start time and prints rows per second on the last run. I run these in Access so I can cut and paste them into emails and spreadsheets. I plan on emailing these directly soon.

If you have the jobs LoadEtlJob2, LoadEtlJob and LoadEtlJobLink then run these in this order before running reports. All these tables do is allow you to link job to a category in the ETL_JOB table. I also plan on taking the job link name and looking up the table name but for now all you can do is lookup the link type. This will tell you it is a hash file or Oracle table or whatever.
Mamu Kim
Madhav_M
Participant
Posts: 43
Joined: Sat Jul 10, 2004 5:47 am

Post by Madhav_M »

Hi
How a variable captured in routine can be outputed to a table?

Thanks
Madi
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

You can call this routine in a job's transform.
Mamu Kim
Post Reply