Page 1 of 1

How can I generate ETL stats on all jobs in a project?

Posted: Fri Mar 12, 2010 8:25 am
by saikailas
I was looking at DSXchange to find the approach to generate the report on datastage project with the run times of each sequencer, sub jobs and the stages in the individual jobs. The details provided in the discussion forums are very helpful and make the work easier.
I ran the dsjob - report on a parallel job and i dont see the information regarding dataset stages.

I wanted to know if I can use the ETL stats to generate the run time stats for all the jobs in a project on Version 8x ( parallel jobs) or if there is any other utility to accomplish this. Please let me know if any steps that I should follow.
Thanks

[* Note - Title made more descriptive - Andy *]

Posted: Fri Mar 12, 2010 9:54 am
by kduke
Modify stage seems to block row counts. I have never seen an issue on datasets. The jobs need to be modified to get row counts for PX jobs. Vincent and others have modified it so it works fine. The issue is PX store row counts at the partition level. So you either need to aggregate all the partitions or modify ETL_ROW_HIST to have partition id as part of the key.

The Driver job can take a job name, sequence name, folder name or 'ALL' and will get row counts for one job or all jobs in a sequence, folder or just all jobs. Big project can run a hour or more. Very useful though.

Posted: Fri Mar 12, 2010 9:58 am
by kduke
Someone with version 8 needs to respond to how it works in that version. I do not have v8 to try it.

If someone fixes these jobs and wants to post their fix for PX along with table changes then zip up your dsx and ddl and email it to me. I will post it on my web site. I will consider posting any other generic job or script someone might think is useful. Make sure your name is in the job properties and script comments. I do not want to take credit for another persons work.

Posted: Fri Mar 12, 2010 7:33 pm
by sohasaid
Saikailas & kduke,

would you please share any posts or topics which can guide me to generate such ETL status report for project jobs because i'm quite new to this & I need it urgently?

Regards.

Posted: Sat Mar 13, 2010 7:36 am
by chulett
Seems like you might not be aware that ETLStats is a pre-built package of jobs / scripts that Kim Duke create and donated to the community via his website. You'll probably need / want to tweak or extend it and you may have issues with 8.x jobs but at least it will give you a huge leg up with your 'urgent' need.

Posted: Sat Mar 13, 2010 2:04 pm
by sohasaid
Thanks chulett, you're right I've no previous idea about this feature in DataStage and it's urgent as the client needs such an ETL report and I need to know how to implement first, and how much time it will take.

Thanks again.

Posted: Sat Mar 13, 2010 2:36 pm
by ray.wurlod
Do you have Metadata Workbench?

Posted: Sat Mar 13, 2010 4:29 pm
by saikailas
Ray,
We do not have Metadata Workbench.

Posted: Sun Mar 14, 2010 5:08 pm
by John Smith
sohasaid wrote:Thanks chulett, you're right I've no previous idea about this feature in DataStage and it's urgent as the client needs such an ETL report and I need to know how to implement first, and how much time it will take.

Thanks again.
sorry to disappoint you but in this forum nothing is urgent as everyone is just a volunteer. How long is a piece of string? You have not specified your requirements and even then it depends on who is doing the job. An expert could do it in less time than a beginner. btw you should also have courtesy to start a new thread and not hijack someone else's thread.

Posted: Mon Mar 15, 2010 4:37 am
by JoshGeorge
XMETA (Repository) will be the right place to look for to generate such a report in 8x versions (Infosphere) especially if logging on to XMETA is switched on.

Posted: Mon Mar 15, 2010 9:20 am
by chulett
Before deciding where all this should come from, it would help to know what exactly it is that you need in this 'ETL report'. Do you? Can you articulate it here for us? :?

Posted: Mon Mar 15, 2010 10:29 am
by kduke
Craig is correct there are lots of ways to get this information. Tell us what you need and why and maybe we can help you better.

XMETA is fine for the last job run but you need to setup Metadata Workbench to get a history of row counts. Do a search there are many topics about all this.

Posted: Mon Mar 15, 2010 12:57 pm
by saikailas
Thank you all very much for the inputs.

Details on the ETL Report that I need.
We have some Sequencer Jobs and each sequencer is calling subjobs and also sub sequencer jobs.
ETL report should generate the run times of the sequencers, subjobs and the stages/links in the subjobs at the project level.
I'm able to generate the xml, detail report on individual jobs through dsjob -report command, but wanted to generate this report at the project level. I can get the start and end times of the stages from the director/monitor, but taking into consideration the large number of jobs that we have, wanted to generate the report so that the jobs/stages/links running for long hours can be traced and can proceed with redesign for better performance.

Posted: Mon Mar 15, 2010 1:30 pm
by dsedi
saikailas wrote:Thank you all very much for the inputs.

Details on the ETL Report that I need.
We have some Sequencer Jobs and each sequencer is calling subjobs and also sub sequencer jobs.
ETL report should generate the run times of the sequencers, subjobs and the stages/links in the subjobs at the project level.
I'm able to generate the xml, detail report on individual jobs through dsjob -report command, but wanted to generate this report at the project level. I can get the start and end times of the stages from the director/monitor, but taking into consideration the large number of jobs that we have, wanted to generate the report so that the jobs/stages/links running for long hours can be traced and can proceed with redesign for better performance.
ETLstats do exactly all the above plus much more ....

We have done a simple job recently which creates a CSV which contains all jobs, catogroy, status start time and elapsed time.
the outline we have used is, query the universe in a server job for jobs and their category names and then pass them in a transformer through the separate routines which is having the following functions with proper error handling etc..
DSGetJobInfo(hJob1,DSJ.JOBSTATUS)
DSGetJobInfo(hJob1,DSJ.JOBSTARTTIMESTAMP)
DSGetJobInfo(hJob1,DSJ.JOBELAPSED)
Please note that we haven't done anything for stage/link level.

Hope this helps a little.

Posted: Mon Mar 15, 2010 4:26 pm
by kduke
I have built some nice tree controls using some of these jobs where you can drill down into a sequence and see all the jobs in that sequence and if there is another sequence drill down into it.

I think there is a a job dependency table. If not then I will be glad to add one. I like small sequences and to see how you started this sequence is difficult without this kind of table. There is a routine which will rebuild the job dependency relationship if your job properties is out of sync. I will post the code if anyone needs it.

There are also included several jobs which will list long running jobs. The reports are really nice and will email you their results if you get that part of EtlStats working.