Log files of a Datastage Job

rajeev_prabhuat · Post by **rajeev_prabhuat** » Fri Nov 26, 2004 3:01 am

Hi,

I am working in DS 7.X and using Parallel extender properties, i am new to DS 7.x.

I developed 4 jobs and i had run 4 jobs y'day evening and went home. Each job loads about 1 million rows. When I came to see the same this morning, only 400,000 rows were loaded and the 4 jobs aborted. I tried to view the log for one of the jobs, but on the Director, it just says "Retrieving" and the process is hung. This is possibly due to a huge log file.

My query is: apart from Director, can we see the log file manually? Is the log file located on the server. If so, let me know where in the directory is it located?

Regards,
Rajeev Prabhu

ray.wurlod · Post by **ray.wurlod** » Fri Nov 26, 2004 3:28 am

DataStage job logs are located on the server.

However they are not log files, as you have assumed. They are database tables in the repository.

If you determine the job number of your job, from the DS_JOBS table, then the log for that job is a table called RT_LOGnnn, where nnn is the job number.

You can view them from the operating system using the command line interface dsjob with its -log... options.

You can view them from the Administrator client command window with a regular query, for example

Code: Select all

SELECT * FROM RT_LOGnnn;

Search the forum for more information about viewing and archiving DataStage job logs.

rajeev_prabhuat · Post by **rajeev_prabhuat** » Fri Nov 26, 2004 4:29 am

Hi,

Ray Thanks you.

But when i was going through the DS documents that we get during instation, i came to see that log files are stored in hash files. Is there anyway to reduce the size of the log files?.

Regards,
Rajeev Prabhu

ray.wurlod wrote:DataStage job logs are located on the server.

However they are not log files, as you have assumed. They are database tables in the repository.

If you determine the job number of your job, from the DS_JOBS table, then the log for that job is a table called RT_LOGnnn, where nnn is the job number.

You can view them from the operating system using the command line interface dsjob with its -log... options.

You can view them from the Administrator client command window with a regular query, for example
Code: Select all
SELECT * FROM RT_LOGnnn;
Search the forum for more information about viewing and archiving DataStage job logs.

ray.wurlod · Post by **ray.wurlod** » Fri Nov 26, 2004 2:52 pm

Purge the log files of old entries on a regular basis. Enable auto-purge wherever possible.

Create job designs that don't generate warnings or unnecessary informational messages.

Periodically (ideally after purging), compress the internal space in the hashed files using the command

Code: Select all

RESIZE RT_LOGnnn * * *

either from the Administrator client command window or from the DataStage shell (dssh).

rajeev_prabhuat · Post by **rajeev_prabhuat** » Tue Nov 30, 2004 10:51 pm

H,

Ray as you told we have table for log for each job, can we take this data from the table and populate into our oracle table to monitor our job.

My requirment is something like this, we have 42 jobs and all the jobs are one to one mapping, we want to store the job name, its start time, end time, status and if error what is the error that has happned. Then i thought of what you had told about the table in Universe. If we can take please let ke know how we can take.

Regards,
Rajeev

rajeev_prabhuat wrote:Hi,

Ray Thanks you.

But when i was going through the DS documents that we get during instation, i came to see that log files are stored in hash files. Is there anyway to reduce the size of the log files?.

Regards,
Rajeev Prabhu
ray.wurlod wrote:DataStage job logs are located on the server.

However they are not log files, as you have assumed. They are database tables in the repository.

If you determine the job number of your job, from the DS_JOBS table, then the log for that job is a table called RT_LOGnnn, where nnn is the job number.

You can view them from the operating system using the command line interface dsjob with its -log... options.

You can view them from the Administrator client command window with a regular query, for example
Code: Select all
SELECT * FROM RT_LOGnnn;
Search the forum for more information about viewing and archiving DataStage job logs.

ray.wurlod · Post by **ray.wurlod** » Tue Nov 30, 2004 11:54 pm

Hashed file is a method of implementing database tables in DataStage, UniVerse, UniData and other databases.

Columns in the log files are:

Code: Select all

@ID          Numeric event identifier (or "//..." for control records)
TYPE         Enumerated severity level of message
WAVE.NO      Internal number to identify run instance
TIMESTAMP    Date and time when event logged
MSG.ARGS     Parameters to add to message text (multi-valued)
MSG.TEXT     Fixed part of message text
FULL.TEXT    MSG.TEXT with parameters substituted from MSG.ARGS
SEVERITY     TYPE decoded into a text string

The first three are INTEGER, the remainder are VARCHAR.

rajeev_prabhuat · Post by **rajeev_prabhuat** » Wed Dec 01, 2004 12:34 am

Hi,

Ray, but how can we acess this table in Universe, can we acess witht he same userid and password or is there any other way.

Can i get the start and end time of the job from the log, from the column TIMESTAMP, as specified below, if so that will be good.

Regards,
Rajeev Prabhu

ray.wurlod wrote:Hashed file is a method of implementing database tables in DataStage, UniVerse, UniData and other databases.

Columns in the log files are:
Code: Select all
@ID          Numeric event identifier (or "//..." for control records)
TYPE         Enumerated severity level of message
WAVE.NO      Internal number to identify run instance
TIMESTAMP    Date and time when event logged
MSG.ARGS     Parameters to add to message text (multi-valued)
MSG.TEXT     Fixed part of message text
FULL.TEXT    MSG.TEXT with parameters substituted from MSG.ARGS
SEVERITY     TYPE decoded into a text string
The first three are INTEGER, the remainder are VARCHAR.

ray.wurlod · Post by **ray.wurlod** » Wed Dec 01, 2004 2:40 am

You can create a DataStage job that retrieves the log information using the metadata I supplied to retrieve using either a UV stage or a hashed file stage. If you use a hashed file stage, the key column should be called AT.ID rather than @ID. Either stage type can constraint on any column, in the usual way.

From there you can do whatever you like with the contents of the log file.

If you feel up to the challenge, you can learn about the log interrogation functions from on-line help. They also have equivalents in the dsjob command line options.

DSGetLogSummary
DSGetNewestLogId
DSGetLogEntry

If you are getting large log files, remember to purge them regularly. If a log reaches 2GB, and preventive action (switching to 64-bit internal addressing) has not been taken, the log file becomes corrupted and your job is guaranteed to abort.

rajeev_prabhuat · Post by **rajeev_prabhuat** » Mon Dec 13, 2004 7:20 am

Hi Ray,

If am using the DSGetLogSummary, to get the log details how do i get the details, should i write a routine ?.

Regards,
Rajeev Prabhu

ray.wurlod wrote:You can create a DataStage job that retrieves the log information using the metadata I supplied to retrieve using either a UV stage or a hashed file stage. If you use a hashed file stage, the key column should be called AT.ID rather than @ID. Either stage type can constraint on any column, in the usual way.

From there you can do whatever you like with the contents of the log file.

If you feel up to the challenge, you can learn about the log interrogation functions from on-line help. They also have equivalents in the dsjob command line options.
DSGetLogSummary
DSGetNewestLogId
DSGetLogEntry
If you are getting large log files, remember to purge them regularly. If a log reaches 2GB, and preventive action (switching to 64-bit internal addressing) has not been taken, the log file becomes corrupted and your job is guaranteed to abort.

ray.wurlod · Post by **ray.wurlod** » Mon Dec 13, 2004 2:34 pm

You WILL need to write a routine to use the DSGetLogEntry() function. The function can only retrieve the details of a single logged event.

However, why are you using these functions at all in a DataStage job? Use a UV stage or hashed file stage to select from the log, and you automatically get everything useful that's in the log. Send this to any destination of your choice.

rajeev_prabhuat · Post by **rajeev_prabhuat** » Tue Dec 14, 2004 3:04 am

Hi Ray,

What are the basic requirments that you need to acess the table through UV db, i am tried careating the metada of the same that you had given but i was not able to get the select work sucessfully, it was showing error saying the LOG_RT26 (this is a log table for a paraticular job) is not available, i tried it through command line and got the LOG name.

Regards,
Rajeev Prabhu

ray.wurlod wrote:You can create a DataStage job that retrieves the log information using the metadata I supplied to retrieve using either a UV stage or a hashed file stage. If you use a hashed file stage, the key column should be called AT.ID rather than @ID. Either stage type can constraint on any column, in the usual way.

From there you can do whatever you like with the contents of the log file.

If you feel up to the challenge, you can learn about the log interrogation functions from on-line help. They also have equivalents in the dsjob command line options.
DSGetLogSummary
DSGetNewestLogId
DSGetLogEntry
If you are getting large log files, remember to purge them regularly. If a log reaches 2GB, and preventive action (switching to 64-bit internal addressing) has not been taken, the log file becomes corrupted and your job is guaranteed to abort.

ray.wurlod · Post by **ray.wurlod** » Tue Dec 14, 2004 2:08 pm

The fundamental requirement is to get the table name correct!

SELECT TIMESTAMP, SEVERITY, FULL.TEXT FROM RT_LOG26 ;

rajeev_prabhuat · Post by **rajeev_prabhuat** » Tue Dec 14, 2004 9:51 pm

Hi Ray,

Thanks you.

See yesterday i had created an routing which gives the details fo the job (Job name, start time, end time, elapsed time, rows read, rows written, the status of the job) and along with that i have created a array variable to collect the DSLogSummary data in the routine but it is give me error value -13, the variable that i have declared is as follows:

LogEntries[10] = DSGetLogSummary(JobHandle,DSJ.LOGINFO,JobStarted,JobEnded,0)

I think this is not the way i shoudl get the detials in the array, can you give me the details how i can get the details in array and how i can split it and take it.

Regards,
Rajeev Prabhu

ray.wurlod wrote:The fundamental requirement is to get the table name correct!

SELECT TIMESTAMP, SEVERITY, FULL.TEXT FROM RT_LOG26 ;

ray.wurlod · Post by **ray.wurlod** » Tue Dec 14, 2004 10:14 pm

Need to see your whole routine. Post it here, surrounded by Code tags (select the whole code, then click on the Code button above).

The JobHandle variable must have been assigned by an earlier DSAttachJob() call.
The result from DSGetLogSummary is a "dynamic array". Fields in dynamic arrays are referred to using angle bracket notation.

Result = DSGetLogSummary(...)
FirstField = Result<1>
SecondField = Result<2>

and so on.

You can also use the MATPARSE statement (refer to DataStage BASIC manual) to load the contents of a dynamic array into a dimensioned array. Dimensioned array elements are accessed with the index value in parentheses (not square brackets).

rajeev_prabhuat · Post by **rajeev_prabhuat** » Wed Dec 15, 2004 12:31 am

Hi Ray,

I am attaching the code for your perusal.

Code: Select all

  $IFNDEF JOBCONTROL.H 
$INCLUDE DSINCLUDE JOBCONTROL.H 
$ENDIF 

ErrorCode = 0;* set this to non-zero to stop the stage/job

ReportType = 0
LogEntries = 0
DirName = InputArg
Suffix = ".txt"
JobHandle = DSJ.ME
JobName = DSGetJobInfo(JobHandle,DSJ.JOBNAME)
JobInvId = DSJobInvocationId
if JobInvId ne "" then
   JobName:= ".":JobInvId
end

JobStarted = convert(" :-","_",DSGetJobInfo(JobHandle,DSJ.JOBSTARTTIMESTAMP))
JobEnded = convert(" :-","_",DSGetJobInfo(JobHandle,DSJ.JOBEOTTIMESTAMP))

Alias = ""
if JobInvId eq "" then
  Alias = DSGetIdForJob(JobName,ErrorCode)
end

if Alias ne "" then
  FileName = JobName:"_":Alias
end
else
  FileName = JobName:"_":JobStarted
end

FileName := Suffix

ReportText = DSMakeJobReport(JobHandle,ReportType,"LF")
*- Getting the required counts from the Job Links
varSrcCount1 = DSGetLinkInfo(JobHandle,'Sequential_File_4','In',DSJ.LINKROWCOUNT)

LogEntries[10] = DSGetLogSummary(JobHandle,DSJ.LOGINFO,JobStarted,JobEnded,0)

varSrcCount2 = 0
varInsCount=0
varSrcCount = varSrcCount1 + varSrcCount2
varUpdCount = 0
varDelCount = 0
Audit_info='"1",'
Audit_info := '"':Field(trims(Field(ReportText,CHAR(10),3)),' ',5):'",'
Audit_info :='"':'LOADING':'",'
Audit_info := '"':right(trims(Field(ReportText,CHAR(10),5)),19):'",'
Audit_info := '"':right(trims(Field(ReportText,CHAR(10),6)),19):'",'
Audit_info := '"':right(trims(Field(ReportText,CHAR(10),7)),8):'",'
Audit_info := '"':right(Field(trims(Field(ReportText,CHAR(10),8)),' ',2),0):'",'
Audit_info := varSrcCount:','
Audit_info := varInsCount:','
Audit_info := varUpdCount:','
Audit_info := varDelCount:','
Audit_info := LogEntries[4]:'"'

Openpath DirName to t.fvar then
  write Audit_info to t.fvar, FileName else
  call DSLogWarn("Failed to write file ":FileName:" to directory ":DirName, "Routine_call_check")
end
end
else
call DSLogWarn("Failed to open directory1 ":DirName, "Routine_call_check")
end
close t.fvar
 
*- Audit Code
AuditText =DSMakeJobReport(JobHandle,1,"LF")
AuditFileName := "Audit_":FileName
Openpath DirName to t.fvar then
write AuditText to t.fvar, AuditFileName else
call DSLogWarn("Failed to write file ":AuditFileName:" to directory ":DirName, "Routine_call_check")
end
end
else
call DSLogWarn("Failed to open directory ":DirName, "Routine_call_check")
end
close t.fvar

Hope you can give me an input on this.

Regards,
Rajeev Prabhu

ray.wurlod wrote:Need to see your whole routine. Post it here, surrounded by Code tags (select the whole code, then click on the Code button above).

The JobHandle variable must have been assigned by an earlier DSAttachJob() call.
The result from DSGetLogSummary is a "dynamic array". Fields in dynamic arrays are referred to using angle bracket notation.

Result = DSGetLogSummary(...)
FirstField = Result<1>
SecondField = Result<2>

and so on.

You can also use the MATPARSE statement (refer to DataStage BASIC manual) to load the contents of a dynamic array into a dimensioned array. Dimensioned array elements are accessed with the index value in parentheses (not square brackets).