what do &PH, RT_LOGS AND HASHFILE_10 contain?

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
arun@sinmax
Premium Member
Premium Member
Posts: 13
Joined: Tue Jun 14, 2005 6:25 am

what do &PH, RT_LOGS AND HASHFILE_10 contain?

Post by arun@sinmax »

chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

The result? :shock: Hmmm, harken back to 1984 for a moment with me...

Stantz: Fire and brimstone coming down from the skies! Rivers and seas boiling!
Spengler: Forty years of darkness! Earthquakes, volcanoes...
Zeddemore: The dead rising from the grave!
Venkman: Human sacrifice, dogs and cats living together... mass hysteria!


On a more serious note, &PH& is the "phantom" directory which all of the DataStage jobs use to communicate their status, seeing as how they run in the background and "phantom" basically means "background" in this product. Goes back to the Prime days, if I remember my Wurlod Lore correctly. It should be kept fairly clean to avoid slowing things down unnecessarily, but simply "clearing" it could affect any currently running jobs. Set up a cron job to keep it pruned of things, say 2+ days old.

Those RT_LOGnnn hashed files are your job logs, marked with the internal job number of the job they are associated with. Use the Director, either manually or via auto-purge (or both), to keep them under control.

HASHFILE_10 sounds like a stray. I periodically look for strays in the project and clean them out, this happens when people "accidentally" create an account based hashed file rather than a pathed one, or create a flat file with just a name in the stage rather than a full path, etc. Anything created with a relative path will end up in the job's project and they are relative to the 'current working directory' of the process that creates them and that CWD is the Project.

For HASHED_10, first assume it is account based (i.e. has a VOC record associated with it) and try a DELETE.FILE HASHFILE_10 from the Administrator's Command window connected to the project in question. If that no workie, delete it from the operating system.

Overall, the process of deleting / clearing / removing anything from a Project directory should be undertaken with extreme caution and with good knowledge of the product. If you aren't certain you're looking at a stray from a wayward job, leave it alone. If you have no idea what it is, leave it alone. Needless to say, remove the wrong thing and it's "Umm, could someone restore the backup, please?" time.
-craig

"You can never have too many knives" -- Logan Nine Fingers
arun@sinmax
Premium Member
Premium Member
Posts: 13
Joined: Tue Jun 14, 2005 6:25 am

Post by arun@sinmax »

Hi craig,

I have cleared &PH with mandatory precautionary steps as this will improve the life of the project and hope make the jobs run quicker.
The steps followed are
1.)Check whether all jobs has been stopped
2.)Bring down the DSEngine
3.)Clear the &PH
4.)Start the DSEngine

I have cleared RT_LOG when required from the administrator if it has been corrupted.

I have two questions here

Is it advisable to clear all the RT_LOGS monthly once?what is the advantage?
Clearing the HASHED FILE from the Project path will clear the data or not?will it increase the performance of HASHED FILE?

THANKS
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Rather than all of those steps and an interruption on the server, simply set up a cron job to find and delete any file over two days old in the &PH& folder. Set it to run it once a day, everyday, and then forget about it.

No need to clear job logs on any kind of regular manual basis either. Simply ensure that the appropriate Auto-Purge settings are in place. If things Go Bad and you occasionally need to manually purge a large volume or handle a corrupted / blown hashed file, so be it, but it should be another thing you automate properly and forget. For the most part.

Clearing what "HASHED FILE"? If by "clearing" you mean issuing a CLEAR.FILE on an account-based hashed file, yes it removes all of the data much like truncating a database table. Note that it will not be "empty empty" as in zero bytes or with nothing in the directory but rather back to a default newly created size of 4K or so.
-craig

"You can never have too many knives" -- Logan Nine Fingers
arun@sinmax
Premium Member
Premium Member
Posts: 13
Joined: Tue Jun 14, 2005 6:25 am

REASON BEHIND CLEARING &PH?HOW DOES THESE PHANTOM LOGS L

Post by arun@sinmax »

WHEN WE CLEAR &PH FOLDER THE JOBS & PROJECTS RUNS FINE.WHAT IS THE REASON ? HOW DOES THE IMPACT TAKES PLACE?

WHETHER IF THE JOBS FINISHES IT SEARCHES FOR ANY KIND OF ENTRY OR READ THE ENTRIES OF PHANTOM LOGS IN @PH FOLDER?

IN SERVER THE MEMORY OCCUPIED BY &PH IS FOUND TO BE VERY NEGLIGIBLE?HOW 2MB OF SPACE OCCUPIED BY &PH WILL IMPACT THE PROJECT?

PLEASE GUIDE ME
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

First off, posting in ALL CAPS is considered SHOUTING online. Please use your inside voice here. :?

The "phantom" directory in each project is &PH& and is used by the background job processes to communicate status with the engine. As noted, once the job completes and this communication has completed, you can safely remove the entries created. Clearing this area while jobs are running means you could break this process, hence the advice to prune a rolling number of days old from it, entries that you know are no longer needed.

And it's not so much about how much space these entries take up, although for some people / systems that may also be a consideration. It's their number you should be staying on top of.
-craig

"You can never have too many knives" -- Logan Nine Fingers
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

Remember, it never clears &PH& automatically, and creates at least one (possibly more) files in there every time a job is run.

I was called into a client once who said that they were experiencing performance problems in their main project. It turns out they had never cleared &PH& in several years, and it had tens of thousands of entries in the directory. So each time it was having to run a job, it would have to load this VERY large directory structure into memory to add the new filename at the end of the list. It was taking almost 5 minutes before a job would start!

There were so many files that rm couldn't delete them with wildcards - there were too many files being returned.

Once it was cleared up their problem went away.
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
Post Reply