Page 1 of 1

Project Directory Growing Big

Posted: Wed Jan 05, 2005 4:01 pm
by poorna_76
Hi,

Our Project Directory RootDir\Ascentail\DataStage\Projects\DEV is growing too big.

The reasons we think may be due to temporary files/intermediate files created during the project.

Can anybody guide us to what are the Folders( or fiels in that folder)we can safely delete, that may be of no use and will not have any effect on the Proejct.

I mean some thing like clearing &PH& folder.

Thanks in Advance.

Re: Project Directory Growing Big

Posted: Wed Jan 05, 2005 4:56 pm
by ogmios
A hint: backup your project. Delete the jobs, reimport them and reschedule them (if that's required for a development environment). The problem you're probably having is that the project log hash files are becoming too big as they only grow and never shrink.

For the rest, If you create temporary files really in the project directory you can erase them, but backup your project before going to that way... one slip up and you "lose" your project.

Ogmios.

P.S. If you would create temporary files in the project directory, rewrite the jobs to put them somewhere else. Own files in the project directories are a nightmare for DataStage cold installs (when migrating to a new computer e.g.).

P.P.S. And of course you can clean up the &PH& directory but you have already automated that. I have not yet seen one DataStage site that does not cleanup &PH& :wink:

Posted: Wed Jan 05, 2005 10:15 pm
by T42
Log files DO shrink. You do know about the Administrator's option to clean up old logs for each project, right?

Posted: Wed Jan 05, 2005 11:31 pm
by kcbland
Are you sure log files shrink? Clearing a job log actually deletes rows at a time, it does not use a truncating type statement. Therefore if a log file is a DYNAMIC hash file, the only way to shrink it completely is either issue a "CLEAR.FILE" command to recover it back to the minimum modulus and empty the overflow or delete the file and recreate it.

I did a quick test and confirmed that the data section does shrink, but the overflow stays inflated.

Posted: Wed Jan 05, 2005 11:50 pm
by vmcburney
You also need to get your data and intermediate files out of the project directory and into a dedicated data directory. Are you using the localhost account to store your hash files? Are you writing intermediate files to a subdirectory of your project? Put all files and hash files used by the jobs into a seperate area and you will find it more manageable.

Posted: Thu Jan 06, 2005 1:01 am
by ray.wurlod
Space within a dynamic hashed file can also be reclaimed using the RESIZE command (but not below the number of groups specified by the MINIMUM.MODULUS parameter).

Code: Select all

RESIZE RT_LOGnnn * * *
The three asterisks mean "do not change any of the tuning parameters".

Posted: Thu Jan 06, 2005 6:57 am
by ogmios
How it seems to work "from outside the box" is that the log hash files are the biggest size they ever were. If you've once got an enormous log file it will stay enormous afterwards independent of how much "clear log"s you do, unless you do a RESIZE.

Maybe a enhancement request for Ascential: do an automatic RESIZE of the log hash files after executing the "Clear log", it can't be that hard to implement.

The easiest solution I've found for big projects is export and import again (especially in a development environment). A BASIC job can also be written to go through the log hash files and do a "clear log" (not deleting the control records) and RESIZE them but I've never bothered to implement it.

Ogmios

Re: Project Directory Growing Big

Posted: Thu Jan 06, 2005 8:05 am
by datastage
ogmios wrote:A hint: backup your project. Delete the jobs, reimport them and reschedule them (if that's required for a development environment). The problem you're probably having is that the project log hash files are becoming too big as they only grow and never shrink.

Ogmios.
is there a technical difference between doing a (1) backup: (2) delete jobs: (3) reimport versus just doing a (1) backup: (2) reimport ? I know personally I would have just done the export/import and not bothered with deleting jobs as a middle step
ogmios wrote:P.P.S. And of course you can clean up the &PH& directory but you have already automated that. I have not yet seen one DataStage site that does not cleanup &PH& :wink:
I have, but its always a site that is new to DS and doesn't have any DS developers would years of experience.

Posted: Thu Jan 06, 2005 8:12 am
by datastage
vmcburney wrote:You also need to get your data and intermediate files out of the project directory and into a dedicated data directory. Are you using the localhost account to store your hash files? Are you writing intermediate files to a subdirectory of your project? Put all files and hash files used by the jobs into a seperate area and you will find it more manageable.
In the Datastage 3.x and 4.x era, I used to prefer having staging and hashed files withing the project directory, maybe everyone's data sets were much smaller then, but certainly today's best practice is a Vincent mentions here.

Also, I know there is a good thread out there with some easy ways to create individual file pointers to hashed files stored outside of the localuv account, but does anyone have a method or script to automate doing this for all hashed files?

Posted: Thu Jan 06, 2005 9:19 am
by poorna_76
Thanks for all the ideas.

I want to mention here the things we are doing & not doing.

We are writting all the intermediate files in seperate directory other than ProjectDirectory.

We are not creating the hashfiles outside the UV.
Is there a way we can delete the hash files that are created temporarlily during the project.
I mean that get created every time we run a job.


During our project development, we some have missed specifying temp directory for sort stage.
Result is we have lot of soa folders.


Thanks

Deleting Temporary Hashed Files

Posted: Thu Jan 06, 2005 3:14 pm
by ray.wurlod
Is there a way we can delete the hash files that are created temporarily during the project?

Yes, it's called designing your job streams so that the hashed files are deleted.

Probably the easiest method is to use the Administrator client (Command window) to execute the commands to delete the hashed files. Then multi-select all those commands and choose Save, to save the list of commands under a single name.

That single name can subsequently be used, for example in an after-job routine ExecTCL or called from DSExecute, to delete all the hashed files.