How can I generate ETL stats on all jobs in a project?
Moderators: chulett, rschirm, roy
How can I generate ETL stats on all jobs in a project?
I was looking at DSXchange to find the approach to generate the report on datastage project with the run times of each sequencer, sub jobs and the stages in the individual jobs. The details provided in the discussion forums are very helpful and make the work easier.
I ran the dsjob - report on a parallel job and i dont see the information regarding dataset stages.
I wanted to know if I can use the ETL stats to generate the run time stats for all the jobs in a project on Version 8x ( parallel jobs) or if there is any other utility to accomplish this. Please let me know if any steps that I should follow.
Thanks
[* Note - Title made more descriptive - Andy *]
I ran the dsjob - report on a parallel job and i dont see the information regarding dataset stages.
I wanted to know if I can use the ETL stats to generate the run time stats for all the jobs in a project on Version 8x ( parallel jobs) or if there is any other utility to accomplish this. Please let me know if any steps that I should follow.
Thanks
[* Note - Title made more descriptive - Andy *]
Modify stage seems to block row counts. I have never seen an issue on datasets. The jobs need to be modified to get row counts for PX jobs. Vincent and others have modified it so it works fine. The issue is PX store row counts at the partition level. So you either need to aggregate all the partitions or modify ETL_ROW_HIST to have partition id as part of the key.
The Driver job can take a job name, sequence name, folder name or 'ALL' and will get row counts for one job or all jobs in a sequence, folder or just all jobs. Big project can run a hour or more. Very useful though.
The Driver job can take a job name, sequence name, folder name or 'ALL' and will get row counts for one job or all jobs in a sequence, folder or just all jobs. Big project can run a hour or more. Very useful though.
Mamu Kim
Someone with version 8 needs to respond to how it works in that version. I do not have v8 to try it.
If someone fixes these jobs and wants to post their fix for PX along with table changes then zip up your dsx and ddl and email it to me. I will post it on my web site. I will consider posting any other generic job or script someone might think is useful. Make sure your name is in the job properties and script comments. I do not want to take credit for another persons work.
If someone fixes these jobs and wants to post their fix for PX along with table changes then zip up your dsx and ddl and email it to me. I will post it on my web site. I will consider posting any other generic job or script someone might think is useful. Make sure your name is in the job properties and script comments. I do not want to take credit for another persons work.
Mamu Kim
Seems like you might not be aware that ETLStats is a pre-built package of jobs / scripts that Kim Duke create and donated to the community via his website. You'll probably need / want to tweak or extend it and you may have issues with 8.x jobs but at least it will give you a huge leg up with your 'urgent' need.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Charter Member
- Posts: 193
- Joined: Tue Sep 05, 2006 8:01 pm
- Location: Australia
sorry to disappoint you but in this forum nothing is urgent as everyone is just a volunteer. How long is a piece of string? You have not specified your requirements and even then it depends on who is doing the job. An expert could do it in less time than a beginner. btw you should also have courtesy to start a new thread and not hijack someone else's thread.sohasaid wrote:Thanks chulett, you're right I've no previous idea about this feature in DataStage and it's urgent as the client needs such an ETL report and I need to know how to implement first, and how much time it will take.
Thanks again.
DS consultant.
-
- Participant
- Posts: 612
- Joined: Thu May 03, 2007 4:59 am
- Location: Melbourne
XMETA (Repository) will be the right place to look for to generate such a report in 8x versions (Infosphere) especially if logging on to XMETA is switched on.
Joshy George
<a href="http://www.linkedin.com/in/joshygeorge1" ><img src="http://www.linkedin.com/img/webpromo/bt ... _80x15.gif" width="80" height="15" border="0"></a>
<a href="http://www.linkedin.com/in/joshygeorge1" ><img src="http://www.linkedin.com/img/webpromo/bt ... _80x15.gif" width="80" height="15" border="0"></a>
Thank you all very much for the inputs.
Details on the ETL Report that I need.
We have some Sequencer Jobs and each sequencer is calling subjobs and also sub sequencer jobs.
ETL report should generate the run times of the sequencers, subjobs and the stages/links in the subjobs at the project level.
I'm able to generate the xml, detail report on individual jobs through dsjob -report command, but wanted to generate this report at the project level. I can get the start and end times of the stages from the director/monitor, but taking into consideration the large number of jobs that we have, wanted to generate the report so that the jobs/stages/links running for long hours can be traced and can proceed with redesign for better performance.
Details on the ETL Report that I need.
We have some Sequencer Jobs and each sequencer is calling subjobs and also sub sequencer jobs.
ETL report should generate the run times of the sequencers, subjobs and the stages/links in the subjobs at the project level.
I'm able to generate the xml, detail report on individual jobs through dsjob -report command, but wanted to generate this report at the project level. I can get the start and end times of the stages from the director/monitor, but taking into consideration the large number of jobs that we have, wanted to generate the report so that the jobs/stages/links running for long hours can be traced and can proceed with redesign for better performance.
ETLstats do exactly all the above plus much more ....saikailas wrote:Thank you all very much for the inputs.
Details on the ETL Report that I need.
We have some Sequencer Jobs and each sequencer is calling subjobs and also sub sequencer jobs.
ETL report should generate the run times of the sequencers, subjobs and the stages/links in the subjobs at the project level.
I'm able to generate the xml, detail report on individual jobs through dsjob -report command, but wanted to generate this report at the project level. I can get the start and end times of the stages from the director/monitor, but taking into consideration the large number of jobs that we have, wanted to generate the report so that the jobs/stages/links running for long hours can be traced and can proceed with redesign for better performance.
We have done a simple job recently which creates a CSV which contains all jobs, catogroy, status start time and elapsed time.
the outline we have used is, query the universe in a server job for jobs and their category names and then pass them in a transformer through the separate routines which is having the following functions with proper error handling etc..
DSGetJobInfo(hJob1,DSJ.JOBSTATUS)
DSGetJobInfo(hJob1,DSJ.JOBSTARTTIMESTAMP)
DSGetJobInfo(hJob1,DSJ.JOBELAPSED)
Please note that we haven't done anything for stage/link level.
Hope this helps a little.
Accept that some days you're the pigeon and some days you're the statue.
I have built some nice tree controls using some of these jobs where you can drill down into a sequence and see all the jobs in that sequence and if there is another sequence drill down into it.
I think there is a a job dependency table. If not then I will be glad to add one. I like small sequences and to see how you started this sequence is difficult without this kind of table. There is a routine which will rebuild the job dependency relationship if your job properties is out of sync. I will post the code if anyone needs it.
There are also included several jobs which will list long running jobs. The reports are really nice and will email you their results if you get that part of EtlStats working.
I think there is a a job dependency table. If not then I will be glad to add one. I like small sequences and to see how you started this sequence is difficult without this kind of table. There is a routine which will rebuild the job dependency relationship if your job properties is out of sync. I will post the code if anyone needs it.
There are also included several jobs which will list long running jobs. The reports are really nice and will email you their results if you get that part of EtlStats working.
Mamu Kim