ETL Job Statistics
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 55
- Joined: Tue Sep 20, 2005 10:58 am
ETL Job Statistics
Dear all
+++++++++++++++++++++++++++
Main objective : To capture metrices or stats for all the jobs running on production and or test.
+++++++++++++++++++++++++++
The last two weeks or so i have been working the Kim's ETLSTATS program. Kim has been very kind with his time in responding to my emails while i was getting it setup.
After downloading and following the instructions I got it to work great. But I have a couple of questions regarding my approach to capturing ETL job statistics using Kim's DBJobReportDb.
Here in Micron we have mix of server and parallel jobs in production.
For good or bad, we also have a lot of sequences with several layers of nesting...i.e sequence calling a sequence calling a sequence calling a job.
Using Kims DSJobReportDB job as the center piece of the solution I was testing a solution where our shell script that runs the job will refer an XML file that has the hierarchy of the jobs and will loop through all the children of the job that was requested to be run and log the STATS for each of the job in the hierarchy using DSJobReportDB.
That worked great as we had put together a XML file almost manually to build the hierarchy of jobs being called
The challenge i am facing is to automate the process of building the hierarchy.
Further research and help from fellow datastage consultants led me to dssearch utility
That seems to be promising in terms of getting me the hierarchy of the jobs and sequences being called by a parent sequence.
But I was curious if others saw any issue with using ds search.
To my the biggest concern is the Multiple instance jobs. The dssearch utility does not of course capture the invocation id and hence the call to DSJobReportDB will not log the instance statistics.
Please let me know of
1) way to build a hierarchy of who calls whom
2) or ideas of dealing with multiple instances with dssearch
Please let me know of the feedback on the overall approach
thanks
arvind
+++++++++++++++++++++++++++
Main objective : To capture metrices or stats for all the jobs running on production and or test.
+++++++++++++++++++++++++++
The last two weeks or so i have been working the Kim's ETLSTATS program. Kim has been very kind with his time in responding to my emails while i was getting it setup.
After downloading and following the instructions I got it to work great. But I have a couple of questions regarding my approach to capturing ETL job statistics using Kim's DBJobReportDb.
Here in Micron we have mix of server and parallel jobs in production.
For good or bad, we also have a lot of sequences with several layers of nesting...i.e sequence calling a sequence calling a sequence calling a job.
Using Kims DSJobReportDB job as the center piece of the solution I was testing a solution where our shell script that runs the job will refer an XML file that has the hierarchy of the jobs and will loop through all the children of the job that was requested to be run and log the STATS for each of the job in the hierarchy using DSJobReportDB.
That worked great as we had put together a XML file almost manually to build the hierarchy of jobs being called
The challenge i am facing is to automate the process of building the hierarchy.
Further research and help from fellow datastage consultants led me to dssearch utility
That seems to be promising in terms of getting me the hierarchy of the jobs and sequences being called by a parent sequence.
But I was curious if others saw any issue with using ds search.
To my the biggest concern is the Multiple instance jobs. The dssearch utility does not of course capture the invocation id and hence the call to DSJobReportDB will not log the instance statistics.
Please let me know of
1) way to build a hierarchy of who calls whom
2) or ideas of dealing with multiple instances with dssearch
Please let me know of the feedback on the overall approach
thanks
arvind
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Thanks for all the nice words but please I do not normally answer emails. If you create this dictionary item then you can report or extract the jobs from a sequence.
Code: Select all
INSERT INTO DICT DS_JOBOBJECTS
(
FIELD,
CODE,
EXP,
NAME,
FORMAT,
SM
) VALUES (
'JobActJobName',
'D',
'12',
'JobActJobName',
'30L',
'S'
)
;
Code: Select all
select
DS_JOBS.NAME SequenceJobName FMT '40L',
DS_JOBOBJECTS.NAME LinkName FMT '30L',
DependJobName
from
DS_JOBOBJECTS,
DS_JOBS
where
DS_JOBOBJECTS.OLETYPE = 'CJSJobActivity'
and DS_JOBOBJECTS.OBJIDNO = DS_JOBS.JOBNO
;
Last edited by kduke on Wed Dec 13, 2006 8:22 pm, edited 1 time in total.
Mamu Kim
If you take the previous query then you could create your own metadata. Now create a job and output the above to a table like:
Code: Select all
Create Table MdSequenceJobs (
SequenceJobName varchar(60),
LinkName varchar(60),
DependJobName varchar(60)
)
Mamu Kim
I have several jobs in EtlStats which create this kind of metadata in a target table. Try to run these jobs. You will be amazed on how much good use you can make out of these types of tables.
The Reporting Assistant cannot report on sequences or PX jobs. You can create your own metadata this way and report on it.
The Reporting Assistant cannot report on sequences or PX jobs. You can create your own metadata this way and report on it.
Mamu Kim
I forgot. You asked about instance ids.
Code: Select all
INSERT INTO DICT DS_JOBOBJECTS
(
FIELD,
CODE,
EXP,
NAME,
FORMAT,
SM
) VALUES (
'JobAct2JobAct2InstanceIdsCount',
'I',
'field(@RECORD<14>, "/", 2)',
'JobAct2JobAct2InstanceIdsCount',
'3R',
'S'
)
;
Code: Select all
INSERT INTO DICT DS_JOBOBJECTS
(
FIELD,
CODE,
EXP,
NAME,
FORMAT,
SM,
ASSOC
) VALUES (
'JobAct2InstanceIds',
'D',
'15',
'JobAct2InstanceIds',
'30L',
'M',
'MvJobAct2'
)
;
Code: Select all
INSERT INTO DICT DS_JOBOBJECTS
(
FIELD,
CODE,
EXP
) VALUES (
'MvJobAct2',
'PH',
'JobAct2InstanceIds'
)
;
Code: Select all
select
DS_JOBS.NAME JobName FMT '40L',
DS_JOBOBJECTS.NAME LinkName FMT '30L',
JobAct2JobAct2InstanceIdsCount,
JobAct2InstanceIds
from
unnest DS_JOBOBJECTS on MvJobAct2,
DS_JOBS
where
DS_JOBOBJECTS.OLETYPE = 'CJSJobActivity'
and JobAct2InstanceIds <> ''
and JobAct2JobAct2InstanceIdsCount > 0
and DS_JOBOBJECTS.OBJIDNO = DS_JOBS.JOBNO
order by
JobAct2InstanceIds
;
Mamu Kim
To integrate this information into EtlStats then you would need to read fields these directly. That would be difficult plus it will break on version 8 but it might be useful in the mean time.
I extracted a lot of this information because of Parameter Navigator or PNav for short. When you take a project with no parameters and you fix it then you need to update the sequences as well. After a job goes from no parameters to 3 for each ODBC stage or 4 for every bulk load stage then you need to add each of these to all the sequences. We had over 500 jobs. We had over 100 sequences. After fixing all the jobs then you need to feed the parameters into all the job activities. All you need to do is automatch the job parameters based on parameter name. So I automated this in a job. This saved weeks of work. All these jobs are included when you purchase PNav. We have a bundled price on PNav and DwNav because both products work together to give you some nice metadata until version 8 comes out. Version 8 solves some of these issues.
I extracted a lot of this information because of Parameter Navigator or PNav for short. When you take a project with no parameters and you fix it then you need to update the sequences as well. After a job goes from no parameters to 3 for each ODBC stage or 4 for every bulk load stage then you need to add each of these to all the sequences. We had over 500 jobs. We had over 100 sequences. After fixing all the jobs then you need to feed the parameters into all the job activities. All you need to do is automatch the job parameters based on parameter name. So I automated this in a job. This saved weeks of work. All these jobs are included when you purchase PNav. We have a bundled price on PNav and DwNav because both products work together to give you some nice metadata until version 8 comes out. Version 8 solves some of these issues.
Mamu Kim
-
- Participant
- Posts: 55
- Joined: Tue Sep 20, 2005 10:58 am
ETL stats
Thanks for all your responses
I will try out these suggestions and report back on how it went
Ray.. currently our scheduling tool is ActiveBatch but there is an effort to find a another scheduling tool.
Will report back in the morning
thanks
arvind
I will try out these suggestions and report back on how it went
Ray.. currently our scheduling tool is ActiveBatch but there is an effort to find a another scheduling tool.
Will report back in the morning
thanks
arvind
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Sorry about that. My notes were a little jumbled! BMC Patrol is used by the operations people to monitor the systems, not for scheduling. Active Batch, as you say, is the scheduling tool. Currently.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.