Analysis of datastage jobs

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

The datastage log messages are stored in Universe tables. Each job has a folder in the DataStage project folder which holds these Universe tables. I don't know how to pluck the start and end time information of jobs from internal tables but I do know how to output the information from a routine.

Write a DataStage routine that outputs the job name, the start time and the end time of any job that calls it to a comma seperated sequential file. You can then call this routine from the After Job routine of each job you want to analyse.

There are some macros already installed with DataStage that you can use in this routine, DSJobName will give you the name of the job that called the routine, DSStartDate and DSStartTime will give you the time the job started and the Date() function will give you the time the after job routine was run, ie the time the job finished.

You can then open this file in Excel.

Vincent McBurney
Data Integration Services
www.intramatix.com
WoMaWil
Participant
Posts: 482
Joined: Thu Mar 13, 2003 7:17 am
Location: Amsterdam

Post by WoMaWil »

The Log-information are hosted in the RT_LOG-file in the project directory. This is a Universe-file. nnn is the Jobnumber.

But with the dsjob-command on the commandline, you can access this files too.

If you need more details, let us know.

Wolfgang Huerter
=====================
Cologne, Germany
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

parshnem

The routines will not get the exact numbers of the monitor portion of the Director. It is very complicated how they store this information. You can get it as explained above but it is not accurate. Your rows per second will not match the monitor. I think MetaStage can get this information and track it. I have the accurate information in my product DwNav. I will extract the information into Access locally on your PC. You can easily push that into Excel.

I keep a history so you can look for trends. I store this information on the server in hash files. I plan on using this information to build restartable sequences. I do not like batch files but you can create restartable batch jobs which will start at the point of failure and not rerun all the jobs. This is currently not a high priority for me but can easily move it up if someone wants to pay for it.

My process has to be called after the job finishes or that data is lost. Just like in the Director if you compile a job then the row counts are no longer available because they wipe them out. There is a technical reason for this because of the way they stored this data.

You want accurate numbers buy DwNav. You want the BASIC way then here are the calls

Code: Select all

RunHandle = DSAttachJob(JobName, DSJ.ERRNONE)
JStat = DSGetJobInfo(RunHandle, DSJ.JOBSTATUS)
StartedDateTime = DSGetJobInfo(RunHandle, DSJ.JOBSTARTTIMESTAMP)
FinishedDateTime = DSGetJobInfo(RunHandle, DSJ.JOBLASTTIMESTAMP)
StageNames = DSGetJobInfo(RunHandle, DSJ.STAGELIST)
NoStages = dcount(StageNames, ',')
for i=1 to NoStages
   StageName = field(StageNames, ',', i)
   LinkType = DSGetStageInfo(RunHandle, StageName, DSJ.STAGETYPE)
   LinkNames = DSGetStageInfo(RunHandle, StageName, DSJ.LINKLIST)
   NoLinks = dcount(LinkNames, ',')
   for j=1 to NoLinks
      LinkName = field(LinkNames, ',', j)
      RowCount = DSGetLinkInfo(RunHandle, StageName, LinkName, DSJ.LINKROWCOUNT)
   next j
next i
JStat = DSDetachJob(RunHandle)
There are a lot of steps in between to get rows per second and elapsed times but this will get you close.

Buy DwNav. It will cost less than developing this code. There are a thousand other reasons to buy it as well.

Kim.

Kim Duke
DwNav - ETL Navigator
www.Duke-Consulting.com

[/code]
Last edited by kduke on Thu Jul 07, 2005 12:42 pm, edited 1 time in total.
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

parshnem

I forgot to tell you that there is a free 30 day trail offer on DwNav. You have to go to www.datastagexchange.com and fill out the form and we will send you an auth code usually in a hour or two. You will need the auth code to make it work. Take a few minutes to install, register and install the server programs then you can download the row counts. Save you a lot of time. It is a full working copy so you can use all of it for 30 days for free. It is not expensive anyway. What do have to lose?

Kim.

Kim Duke
DwNav - ETL Navigator
www.Duke-Consulting.com
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The underlying database is no longer called UniVerse, since IBM became the owner of UniVerse and the development of the two products (UniVerse and DataStage) began to diverge. The underlying database is now a part of the DataStage Engine, and has no specific name.
Names of the tables are theoretically Ascential Software's proprietary information, but are out there for everyone to see. But you should be aware that this gives Ascential the right to change them without notifying anybody.

Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

After you install the DwNav components onto the DataStage server does it start recording job history and row counts from that point on or can it retrieve previous job history? Is the population of the history in the hash files automated or does the user trigger it by requesting it from the DwNav client?

Vincent McBurney
Data Integration Services
www.intramatix.com
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Vincent

If you right click on a job then it will retrieve that jobs row history for all the runs that have been stored on the hash. There is no way to go back and get row history because it is stored anywhere. You can get all jobs as well it just takes a little longer to calculate and download. It runs a server command called DWN_RT_ROW_COUNT.uvb. It has one argument for the job name. So the hash files on the server are always maintained.

1. DWN_RT_JOB_STAT_HIST
2. DWN_RT_JOB_STAT
3. DWN_RT_ROW_COUNT_HIST
4. DWN_RT_ROW_COUNT
5. DWN_RT_PARAM_HIST
6. DWN_RT_PARAM

If the last run aborted then it creates a record for it. The files that end in HIST have a timestamp as part of the key. The others store only the last run values. These hash files are also available to push these row counts in Oracle or whatever. I have seen these available on the web so if you run 24/7 then you can see what job is active and for how long. This same code is in my other product DsWebMon.

Ken has some cool ideas in a white paper about using this history for lots of projections and performance studies. Some of it was discussed here so search for it.

Kim.

Kim Duke
DwNav - ETL Navigator
www.Duke-Consulting.com
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

I am still unclear on how the hash files are being populated. Is the DWN_RT_ROW_COUNT.uvb command being executed when the user right clicks on a job? Is it viable to run this command from job control code to ensure that the history of a job is updated every time the job is run?

Vincent McBurney
Data Integration Services
www.intramatix.com
mhester
Participant
Posts: 622
Joined: Tue Mar 04, 2003 5:26 am
Location: Phoenix, AZ
Contact:

Post by mhester »

As pointed out here, the logs are stored in Hash tables within UV (for lack of a better name) and can easily be accessed. What you do with this information is entirely up to you. You can load it into a relational DB, EXCEL spreadsheet, Word doc, Access DB, etc... Anything that can be accessed via ODBC can be a target.

If you want to accumulate these stats on a per job run basis then you might consider adding some sort of after job routine to gather the stats from the current run and write these results to a target. You could even go so far as to write the results to a temp file and then load the results via DS to a target. I have written many BASIC routines that write to almost every type of DB, file etc... that exists and there is documentation that explains how to use ODBC from BASIC on your CD.

Most Ascential consultants have used these types of routines and have probably implemented this type of data gathering engine at most customer sites.

The above method also ensures that you gather all of the stats for a job and does not care if the log is cleared because you have written the results prior to clearing the log.

Regards,

Michael Hester
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Michael

By the way thanks. I learned this from you. Very important information.

Kim.

Kim Duke
DwNav - ETL Navigator
www.Duke-Consulting.com
Post Reply