debugging core files

Archive of postings to DataStageUsers@Oliver.com. This forum intended only as a reference and cannot be posted to.

Moderators: chulett, rschirm

Locked
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

debugging core files

Post by admin »

Hi All,

how to debug the core file that is being generated during a job hang ?. I need to check out the reason for the job hang. job hang in the sense, the process ( phantom ) does not exist while doing "ps" but the status of the job in the director showing "running" and the monitor does not show any change in processing. Also, where to find the object for that particular job that is hanged ?. ( additional info: version: 3.6 on AIX.)

Thanks and Regards
Karthik
Analyst
Deutsche Bank
singapore
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

Post by admin »

Use any core inspection tool you have available. However, without information about memory usage in DataStage - which is not published and will not be - you have no idea what youre looking at, so attempting to analyse core files is somewhat moot. If the job still has a status of Running, you may find files in the &PH& subdirectory containing the name StageRun. These are the processes that execute the code produced by Transformer stages. There will also be a file for the job itself, with DSD.RUN as part of its file name. The object for the job control and Transformer stage executables is found in RT_BPnn.O, where nn is the job number. This is compiled BASIC - not operating system executable code. What were you planning to do with it?

> ----------
> From: muthusamy.karthikeyan@db.com[SMTP:muthusamy.karthikeyan@db.com]
> Reply To: informix-datastage@oliver.com
> Sent: Tuesday, 14 November 2000 23:54
> To: informix-datastage@oliver.com
> Subject: debugging core files
>
> Hi All,
>
> how to debug the core file that is being generated during a job hang
> ?. I need to check out the reason for the job hang. job hang in the
> sense, the process ( phantom ) does not exist while doing "ps" but the
> status of the job in the director showing "running" and the monitor
> does not show any change in processing. Also, where to find the object
> for that particular job that is hanged ?. ( additional info: version:
> 3.6 on AIX.)
>
> Thanks and Regards
> Karthik
> Analyst
> Deutsche Bank
> singapore
>
>
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

Post by admin »

Is this hanging predictable or repeatable, ie does it always happen when you run a particular job?

Or, is it random, perhaps more likely to happen when there are a number of jobs running? Does this job have a large number of stages?

-----Original Message-----
From: muthusamy.karthikeyan@db.com [SMTP:muthusamy.karthikeyan@db.com]
Sent: Tuesday, November 14, 2000 10:55 PM
To: informix-datastage@oliver.com
Subject: debugging core files

Hi All,

how to debug the core file that is being generated during a job hang ?. I need to check out the reason for the job hang. job hang in the sense, the process ( phantom ) does not exist while doing "ps" but the status of the job in the director showing "running" and the monitor does not show any change in processing. Also, where to find the object for that particular job that is hanged ?. ( additional info: version: 3.6 on AIX.)

Thanks and Regards
Karthik
Analyst
Deutsche Bank
singapore



*************************************************************************
This e-mail and any files transmitted with it may be confidential and are intended solely for the use of the individual or entity to whom they are addressed. If you have received this e-mail in
error, please notify the sender by return e-mail, and delete this e-mail from your in-box. Do not copy it to anybody else

*************************************************************************
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

Post by admin »

Ray,

> Use any core inspection tool you have available. However, without
> information about memory usage in DataStage - which is not published
> and will not be - you have no idea what youre looking at, so
> attempting to analyse core files is somewhat moot. If the job still
> has a status of Running, you may find files in the &PH& subdirectory
> containing the name StageRun. These are the processes that execute
> the code produced by Transformer stages. There will also be a file
> for the job itself, with DSD.RUN as part of its file name. The object
> for the job control and Transformer stage executables is found in
> RT_BPnn.O, where nn is the job number. This is compiled BASIC - not
> operating system executable code. What were you planning to do with
> it?

We were facing the mentioned problem quite frequently. And, so i was curious to find whether i could get some info on whether there is any problem in respect with AIX ( memory, permissions..et) or with the data stage. when we escalated the problem to support, they were asking for the core file. so i thouhght i could try my hand in breaking the core file.

david,

>> Is this hanging predictable or repeatable, ie does it always happen
>> when you run a particular job? Or, is it random, perhaps more likely
>> to happen when there are a number of jobs running? Does this job
>> have a large number of stages?

The haning is not predictable , it happens randomly on different jobs. To some extent, yes, most probably hangs occur when a number of jobs are running. and, at the max. we have around 10 stages. since we run a large number of jobs in production, we are not able to keep track of jobs which had been hanging for a logn period of time, which in turns affects the up time.

any way out to crack this hang problem, or the cause of it?. any way to get in to universe to check the dead ones ?. does the data stage kill the jobs that had hanged ( but not updating the status, which shows running), bcoz we could not find any process running for the hanged one.

thankx for your attention.

Thanks and Regards
Karthik
Analyst
Deutsche Bank
singapore
065-423-7410
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

Post by admin »

Karthik,

Informix support should be able to help you on this one. It sounds like a known bug relating to the creation/naming of work files in &PH&. Ask them about this one. If this is the case, then it is fixed in 4.0.1 (I think I have the version number right).

David

-----Original Message-----
From: muthusamy.karthikeyan@db.com [mailto:muthusamy.karthikeyan@db.com]
Sent: Wednesday, 15 November 2000 18:47
To: informix-datastage@oliver.com
Subject: RE: debugging core files


We were facing the mentioned problem quite frequently. And, so i was curious to find whether i could get some info on whether there is any problem in respect with AIX ( memory, permissions..et) or with the data stage. when we escalated the problem to support, they were asking for the core file. so i thouhght i could try my hand in breaking the core file.


The haning is not predictable , it happens randomly on different jobs. To some extent, yes, most probably hangs occur when a number of jobs are running. and, at the max. we have around 10 stages. since we run a large number of jobs in production, we are not able to keep track of jobs which had been hanging for a logn period of time, which in turns affects the up time.

any way out to crack this hang problem, or the cause of it?. any way to get in to universe to check the dead ones ?. does the data stage kill the jobs that had hanged ( but not updating the status, which shows running), bcoz we could not find any process running for the hanged one.

thankx for your attention.

Thanks and Regards
Karthik
Analyst
Deutsche Bank
singapore
065-423-7410
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

Post by admin »

The support folks have information about what should be where in memory. So its reasonable that they would request core dumps for analysis. As I noted earlier, memory usage for/by DataStage processes is not public knowledge (and tends to change release by release).

> ----------
> From: muthusamy.karthikeyan@db.com[SMTP:muthusamy.karthikeyan@db.com]
> Reply To: informix-datastage@oliver.com
> Sent: Wednesday, 15 November 2000 19:46
> To: informix-datastage@oliver.com
> Subject: RE: debugging core files
>
> Ray,
>
> > Use any core inspection tool you have available. However, without
> > information about memory usage in DataStage - which is not published
> > and will not be - you have no idea what youre looking at, so
> > attempting to analyse core files is somewhat moot. If the job still
> > has a status of Running, you may find files in the &PH& subdirectory
> > containing the name StageRun. These are the processes that execute
> > the code produced by Transformer stages. There will also be a
> file
> > for the job itself, with DSD.RUN as part of its file name. The
> > object for the job control and Transformer stage executables is
> found in
> > RT_BPnn.O, where nn is the job number. This is compiled BASIC - not
> > operating system executable code. What were you planning to do with
> > it?
>
> We were facing the mentioned problem quite frequently. And, so i was
> curious to find whether i could get some info on whether there is any
> problem in respect with AIX ( memory, permissions..et) or with the
> data stage. when we escalated the problem to support, they were asking
> for the core file. so i thouhght i could try my hand in breaking the
> core file.
>
> david,
>
> >> Is this hanging predictable or repeatable, ie does it always happen
> when you
> >> run a particular job?
> >> Or, is it random, perhaps more likely to happen when there are a
> >> number
> of
> >> jobs running? Does this job have a large number of stages?
>
> The haning is not predictable , it happens randomly on different jobs.
> To some extent, yes, most probably hangs occur when a number of jobs
> are running. and, at the max. we have around 10 stages. since we run a
> large number of jobs in production, we are not able to keep track of
> jobs which had been hanging for a logn period of time, which in turns
> affects the up time.
>
> any way out to crack this hang problem, or the cause of it?. any way
> to get in to universe to check the dead ones ?. does the data stage
> kill the jobs that had hanged ( but not updating the status, which
> shows running), bcoz we could not find any process running for the
> hanged one.
>
> thankx for your attention.
>
> Thanks and Regards
> Karthik
> Analyst
> Deutsche Bank
> singapore
> 065-423-7410
>
>
Locked