Capture file name

gdean · Post by **gdean** » Fri Jun 11, 2004 11:36 am

Hi,

During the daily refresh of our warehouse some of the DataStage jobs abort (still in development). We then find that there is a huge file named "Capturexxxx" left behind in the temp dir. Since there are many jobs that are aborted we are not able to figure our what is the source for this capture file.

1. From the name of the capture file left in the temp dir is it possible to tell which job or which stage type in particular has created it?

2. Is it true that every stage in server jobs generate temp files with names uniqe to that stage? If so, can someone give a few examples.

Thank You,
Gregg

ray.wurlod · Post by **ray.wurlod** » Fri Jun 11, 2004 7:29 pm

The Capturexxx files are caused by the EXECUTE statement (which is invoked within such calls as DSExecute, ExecTCL, etc.). Other internal processes, such as sorting, can generate temporary files.
Part of the file name is the pid of the process that generated it. You need to track back to the records in &PH& and/or the job's status and log files to match these to jobs and stages.

These files are placed in the directory referred to by the UVTEMP configuration parameter. This should point to somewhere with plenty of free disk space.

It is NOT true that every stage generates temp files with names unique to the stage. I don't know who told you that but it's simply not correct. Most stage types don't create any temp files at all; it depends what they're doing internally.

gdean · Post by **gdean** » Sat Jun 12, 2004 1:58 pm

Thank you so much for the reply Ray.

Can you also please tell me how to read the records in the directory &PH&. I have searched the forum for more help in this regard but didn't find any information.

Appreciate your help.

Thank You,
Gregg

ray.wurlod · Post by **ray.wurlod** » Sat Jun 12, 2004 7:23 pm

&PH& is a directory, but the ampersands in its name mean that it's tricky to get at; ampersand is a meaningful character in the shell (it means "start a background process"). To get there, you need either to quote the name (cd '&PH&') or escape the ampersands (cd \&PH\&).

All files in the &PH& directory are text files, so can be read with any text viewer. There are three main kinds of file in &PH&; the file name signifies which kind a file is.

A file with a name beginning "DSD.RUN" is a record of a server job run or job sequence run. Ordinarily it records the start and exit of child processes ("phantom" means "background processes") and their associated process IDs.

A file with a name beginning "DSD.StageRun" is the record of a process that is executing, or has executed, the code for a Transformer stage. These are supposed to be cleaned up automatically when the job finishes, but may be left around if the job aborts. Resetting the job will remove the information in such files to the job log ("from previous run") before deleting the files.

A file with a name ending in "trace" is the result of enabling stage tracing when running a server job. The file name will include the name of the job and the name of the stage that was traced.

The numeric components of file names in the &PH& directory are internal format date and time when the phantom process was started.

If you create DataStage BASIC code that starts any other phantom processes, a log file for these is also kept in the &PH& and is not removed automatically; this would be your responsibility.

It is best practice to purge &PH& periodically of old entries, as the presence of very many entries can slow startup times of jobs as the next free slot in the directory structure is sought.

gdean · Post by **gdean** » Mon Jun 14, 2004 6:58 am

Ray,

Thank you very much. All the information you have given is very useful to me.

Gregg

gdean · Post by **gdean** » Wed Jun 16, 2004 11:09 am

Hello Ray,

I have tried everything you have said earlier to locate the job that has created the Capture file. The file name was "Capture37292aa". I searched all the files in &PH& dir for the pid 37292. But, I didn't get any record with the pid.

The main problem is that whenever such a capture file is generated our repository gets corrupted (can't see any jobs in the designer) and we have to do a "Clean up" of the project to continue working. This is also effecting other developers who have been working at that time with DS.

Do you have any idea why this is happenning? How can I avoid the affect that the capture file is having on other projects as well??

Thank you in advance,
Gregg

ray.wurlod · Post by **ray.wurlod** » Wed Jun 16, 2004 4:34 pm

EXECUTE starts a new shell, and therefore a new pid is allocated. The pid of its caller is not available (alas). So the pid in the Capturexxxxxaa file may not have a record in &PH&.

Another possibility is that you are on a system where pids can be larger than five digits. Only the rightmost five digits of the pid appear in the Capturexxxxxaa file name. This may help.

Other than that, you could do an export of all job designs and routines, then search through the DSX file for all instances of EXECUTE, DSExecute (and, possibly, ExecSH, ExecDOS and ExecTCL) to try to find any likely culprit.

It is not normal behaviour that using this command "corrupts the Repository". What form does this "corruption" take? Maybe you need to determine whether the indexes have become corrupted, or maybe you're just waiting for some resource to be released. As a first step try these:

Code: Select all

LIST.INDEX DS_JOBOBJECTS ALL
UVFIXFILE DS_JOBOBJECTS TRACE ALL VLEVEL 1
LIST.READU

Look for "rebuild required" in LIST.INDEX report, and any error message in the UVFIXFILE report.

gdean · Post by **gdean** » Thu Jun 17, 2004 2:16 pm

Hi Ray,

I fully understand when you said "So the pid in the Capturexxxxxaa file may not have a record in &PH&. ". I tried to break my job voluntarily and then observed all the capture files that were being generated. Then I went back to check that pids of atleast some of the capture files are found in the &PH& files. Now I understand what is happenning.

The only solution I can think of is to make sure that whenever a capture file stays in the temp dir for more than a couple of hour the admin be informed through an email. Or is there is a better thing to do?

Coming to the corruption of the repository, I have execute the commands you have listed. Didn't find any bugs there. It so happens that suddenly in the designer/director/manager we don't see any of our jobs. Sometimes we just see jobs but not table definitions. I have noticed that this happens only and immediately after a job gets aborted and there is a capture file left out in the temp space.

May be I'm not completely correct saying that the Capture file is the cause and Repository Corruption is the action. As you have said maybe we were just waiting for some resources to be released. But, the question now is like what resources?? I am afraid there might be a bigger issue than what we are perceiving.

Thanks again,
-Gregg