ETL Job Statistics

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
aramachandra
Participant
Posts: 55
Joined: Tue Sep 20, 2005 10:58 am

ETL Job Statistics

Post by aramachandra »

Dear all

+++++++++++++++++++++++++++

Main objective : To capture metrices or stats for all the jobs running on production and or test.

+++++++++++++++++++++++++++

The last two weeks or so i have been working the Kim's ETLSTATS program. Kim has been very kind with his time in responding to my emails while i was getting it setup.

After downloading and following the instructions I got it to work great. But I have a couple of questions regarding my approach to capturing ETL job statistics using Kim's DBJobReportDb.

Here in Micron we have mix of server and parallel jobs in production.

For good or bad, we also have a lot of sequences with several layers of nesting...i.e sequence calling a sequence calling a sequence calling a job.

Using Kims DSJobReportDB job as the center piece of the solution I was testing a solution where our shell script that runs the job will refer an XML file that has the hierarchy of the jobs and will loop through all the children of the job that was requested to be run and log the STATS for each of the job in the hierarchy using DSJobReportDB.

That worked great as we had put together a XML file almost manually to build the hierarchy of jobs being called

The challenge i am facing is to automate the process of building the hierarchy.

Further research and help from fellow datastage consultants led me to dssearch utility

That seems to be promising in terms of getting me the hierarchy of the jobs and sequences being called by a parent sequence.

But I was curious if others saw any issue with using ds search.

To my the biggest concern is the Multiple instance jobs. The dssearch utility does not of course capture the invocation id and hence the call to DSJobReportDB will not log the instance statistics.

Please let me know of

1) way to build a hierarchy of who calls whom

2) or ideas of dealing with multiple instances with dssearch

Please let me know of the feedback on the overall approach

thanks
arvind
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

It might be worth mentioning that the enterprise scheduling tool currently in use is BMC Patrol, if I remember correctly.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Thanks for all the nice words but please I do not normally answer emails. If you create this dictionary item then you can report or extract the jobs from a sequence.

Code: Select all

INSERT INTO DICT DS_JOBOBJECTS 
(
   FIELD, 
   CODE, 
   EXP, 
   NAME, 
   FORMAT, 
   SM
) VALUES (
   'JobActJobName', 
   'D', 
   '12', 
   'JobActJobName', 
   '30L', 
   'S'
)
; 

Code: Select all

select
   DS_JOBS.NAME SequenceJobName FMT '40L',
   DS_JOBOBJECTS.NAME LinkName FMT '30L',
   DependJobName
from 
   DS_JOBOBJECTS,
   DS_JOBS
where 
   DS_JOBOBJECTS.OLETYPE = 'CJSJobActivity'
   and DS_JOBOBJECTS.OBJIDNO = DS_JOBS.JOBNO 
; 
Last edited by kduke on Wed Dec 13, 2006 8:22 pm, edited 1 time in total.
Mamu Kim
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

If you take the previous query then you could create your own metadata. Now create a job and output the above to a table like:

Code: Select all

Create Table MdSequenceJobs (
   SequenceJobName   varchar(60),
   LinkName          varchar(60),
   DependJobName     varchar(60)
)
Mamu Kim
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

I have several jobs in EtlStats which create this kind of metadata in a target table. Try to run these jobs. You will be amazed on how much good use you can make out of these types of tables.

The Reporting Assistant cannot report on sequences or PX jobs. You can create your own metadata this way and report on it.
Mamu Kim
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

I forgot. You asked about instance ids.

Code: Select all

INSERT INTO DICT DS_JOBOBJECTS 
(
   FIELD, 
   CODE, 
   EXP, 
   NAME, 
   FORMAT, 
   SM
) VALUES (
   'JobAct2JobAct2InstanceIdsCount', 
   'I', 
   'field(@RECORD<14>, "/", 2)', 
   'JobAct2JobAct2InstanceIdsCount', 
   '3R', 
   'S' 
)
; 

Code: Select all

INSERT INTO DICT DS_JOBOBJECTS 
(
   FIELD, 
   CODE, 
   EXP, 
   NAME, 
   FORMAT, 
   SM, 
   ASSOC
) VALUES (
   'JobAct2InstanceIds', 
   'D', 
   '15', 
   'JobAct2InstanceIds', 
   '30L', 
   'M', 
   'MvJobAct2'
)
; 

Code: Select all

INSERT INTO DICT DS_JOBOBJECTS 
(
   FIELD, 
   CODE, 
   EXP
) VALUES (
   'MvJobAct2', 
   'PH', 
   'JobAct2InstanceIds'
)
; 

Code: Select all

select
   DS_JOBS.NAME JobName FMT '40L',
   DS_JOBOBJECTS.NAME LinkName FMT '30L',
   JobAct2JobAct2InstanceIdsCount,
   JobAct2InstanceIds
from 
   unnest DS_JOBOBJECTS on MvJobAct2,
   DS_JOBS
where 
   DS_JOBOBJECTS.OLETYPE = 'CJSJobActivity'
   and JobAct2InstanceIds <> ''
   and JobAct2JobAct2InstanceIdsCount > 0
   and DS_JOBOBJECTS.OBJIDNO = DS_JOBS.JOBNO 
order by
   JobAct2InstanceIds
; 
Mamu Kim
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

To integrate this information into EtlStats then you would need to read fields these directly. That would be difficult plus it will break on version 8 but it might be useful in the mean time.

I extracted a lot of this information because of Parameter Navigator or PNav for short. When you take a project with no parameters and you fix it then you need to update the sequences as well. After a job goes from no parameters to 3 for each ODBC stage or 4 for every bulk load stage then you need to add each of these to all the sequences. We had over 500 jobs. We had over 100 sequences. After fixing all the jobs then you need to feed the parameters into all the job activities. All you need to do is automatch the job parameters based on parameter name. So I automated this in a job. This saved weeks of work. All these jobs are included when you purchase PNav. We have a bundled price on PNav and DwNav because both products work together to give you some nice metadata until version 8 comes out. Version 8 solves some of these issues.
Mamu Kim
aramachandra
Participant
Posts: 55
Joined: Tue Sep 20, 2005 10:58 am

ETL stats

Post by aramachandra »

Thanks for all your responses

I will try out these suggestions and report back on how it went

Ray.. currently our scheduling tool is ActiveBatch but there is an effort to find a another scheduling tool.

Will report back in the morning


thanks
arvind
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Sorry about that. My notes were a little jumbled! BMC Patrol is used by the operations people to monitor the systems, not for scheduling. Active Batch, as you say, is the scheduling tool. Currently.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply