Project Directory Growing Big
Moderators: chulett, rschirm, roy
Project Directory Growing Big
Hi,
Our Project Directory RootDir\Ascentail\DataStage\Projects\DEV is growing too big.
The reasons we think may be due to temporary files/intermediate files created during the project.
Can anybody guide us to what are the Folders( or fiels in that folder)we can safely delete, that may be of no use and will not have any effect on the Proejct.
I mean some thing like clearing &PH& folder.
Thanks in Advance.
Our Project Directory RootDir\Ascentail\DataStage\Projects\DEV is growing too big.
The reasons we think may be due to temporary files/intermediate files created during the project.
Can anybody guide us to what are the Folders( or fiels in that folder)we can safely delete, that may be of no use and will not have any effect on the Proejct.
I mean some thing like clearing &PH& folder.
Thanks in Advance.
Re: Project Directory Growing Big
A hint: backup your project. Delete the jobs, reimport them and reschedule them (if that's required for a development environment). The problem you're probably having is that the project log hash files are becoming too big as they only grow and never shrink.
For the rest, If you create temporary files really in the project directory you can erase them, but backup your project before going to that way... one slip up and you "lose" your project.
Ogmios.
P.S. If you would create temporary files in the project directory, rewrite the jobs to put them somewhere else. Own files in the project directories are a nightmare for DataStage cold installs (when migrating to a new computer e.g.).
P.P.S. And of course you can clean up the &PH& directory but you have already automated that. I have not yet seen one DataStage site that does not cleanup &PH&
For the rest, If you create temporary files really in the project directory you can erase them, but backup your project before going to that way... one slip up and you "lose" your project.
Ogmios.
P.S. If you would create temporary files in the project directory, rewrite the jobs to put them somewhere else. Own files in the project directories are a nightmare for DataStage cold installs (when migrating to a new computer e.g.).
P.P.S. And of course you can clean up the &PH& directory but you have already automated that. I have not yet seen one DataStage site that does not cleanup &PH&
In theory there's no difference between theory and practice. In practice there is.
Are you sure log files shrink? Clearing a job log actually deletes rows at a time, it does not use a truncating type statement. Therefore if a log file is a DYNAMIC hash file, the only way to shrink it completely is either issue a "CLEAR.FILE" command to recover it back to the minimum modulus and empty the overflow or delete the file and recreate it.
I did a quick test and confirmed that the data section does shrink, but the overflow stays inflated.
I did a quick test and confirmed that the data section does shrink, but the overflow stays inflated.
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
-
- Participant
- Posts: 3593
- Joined: Thu Jan 23, 2003 5:25 pm
- Location: Australia, Melbourne
- Contact:
You also need to get your data and intermediate files out of the project directory and into a dedicated data directory. Are you using the localhost account to store your hash files? Are you writing intermediate files to a subdirectory of your project? Put all files and hash files used by the jobs into a seperate area and you will find it more manageable.
Certus Solutions
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Space within a dynamic hashed file can also be reclaimed using the RESIZE command (but not below the number of groups specified by the MINIMUM.MODULUS parameter).
The three asterisks mean "do not change any of the tuning parameters".
Code: Select all
RESIZE RT_LOGnnn * * *
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
How it seems to work "from outside the box" is that the log hash files are the biggest size they ever were. If you've once got an enormous log file it will stay enormous afterwards independent of how much "clear log"s you do, unless you do a RESIZE.
Maybe a enhancement request for Ascential: do an automatic RESIZE of the log hash files after executing the "Clear log", it can't be that hard to implement.
The easiest solution I've found for big projects is export and import again (especially in a development environment). A BASIC job can also be written to go through the log hash files and do a "clear log" (not deleting the control records) and RESIZE them but I've never bothered to implement it.
Ogmios
Maybe a enhancement request for Ascential: do an automatic RESIZE of the log hash files after executing the "Clear log", it can't be that hard to implement.
The easiest solution I've found for big projects is export and import again (especially in a development environment). A BASIC job can also be written to go through the log hash files and do a "clear log" (not deleting the control records) and RESIZE them but I've never bothered to implement it.
Ogmios
In theory there's no difference between theory and practice. In practice there is.
Re: Project Directory Growing Big
is there a technical difference between doing a (1) backup: (2) delete jobs: (3) reimport versus just doing a (1) backup: (2) reimport ? I know personally I would have just done the export/import and not bothered with deleting jobs as a middle stepogmios wrote:A hint: backup your project. Delete the jobs, reimport them and reschedule them (if that's required for a development environment). The problem you're probably having is that the project log hash files are becoming too big as they only grow and never shrink.
Ogmios.
I have, but its always a site that is new to DS and doesn't have any DS developers would years of experience.ogmios wrote:P.P.S. And of course you can clean up the &PH& directory but you have already automated that. I have not yet seen one DataStage site that does not cleanup &PH&
Byron Paul
WARNING: DO NOT OPERATE DATASTAGE WITHOUT ADULT SUPERVISION.
"Strange things are afoot in the reject links" - from Bill & Ted's DataStage Adventure
WARNING: DO NOT OPERATE DATASTAGE WITHOUT ADULT SUPERVISION.
"Strange things are afoot in the reject links" - from Bill & Ted's DataStage Adventure
In the Datastage 3.x and 4.x era, I used to prefer having staging and hashed files withing the project directory, maybe everyone's data sets were much smaller then, but certainly today's best practice is a Vincent mentions here.vmcburney wrote:You also need to get your data and intermediate files out of the project directory and into a dedicated data directory. Are you using the localhost account to store your hash files? Are you writing intermediate files to a subdirectory of your project? Put all files and hash files used by the jobs into a seperate area and you will find it more manageable.
Also, I know there is a good thread out there with some easy ways to create individual file pointers to hashed files stored outside of the localuv account, but does anyone have a method or script to automate doing this for all hashed files?
Byron Paul
WARNING: DO NOT OPERATE DATASTAGE WITHOUT ADULT SUPERVISION.
"Strange things are afoot in the reject links" - from Bill & Ted's DataStage Adventure
WARNING: DO NOT OPERATE DATASTAGE WITHOUT ADULT SUPERVISION.
"Strange things are afoot in the reject links" - from Bill & Ted's DataStage Adventure
Thanks for all the ideas.
I want to mention here the things we are doing & not doing.
We are writting all the intermediate files in seperate directory other than ProjectDirectory.
We are not creating the hashfiles outside the UV.
Is there a way we can delete the hash files that are created temporarlily during the project.
I mean that get created every time we run a job.
During our project development, we some have missed specifying temp directory for sort stage.
Result is we have lot of soa folders.
Thanks
I want to mention here the things we are doing & not doing.
We are writting all the intermediate files in seperate directory other than ProjectDirectory.
We are not creating the hashfiles outside the UV.
Is there a way we can delete the hash files that are created temporarlily during the project.
I mean that get created every time we run a job.
During our project development, we some have missed specifying temp directory for sort stage.
Result is we have lot of soa folders.
Thanks
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Deleting Temporary Hashed Files
Is there a way we can delete the hash files that are created temporarily during the project?
Yes, it's called designing your job streams so that the hashed files are deleted.
Probably the easiest method is to use the Administrator client (Command window) to execute the commands to delete the hashed files. Then multi-select all those commands and choose Save, to save the list of commands under a single name.
That single name can subsequently be used, for example in an after-job routine ExecTCL or called from DSExecute, to delete all the hashed files.
Yes, it's called designing your job streams so that the hashed files are deleted.
Probably the easiest method is to use the Administrator client (Command window) to execute the commands to delete the hashed files. Then multi-select all those commands and choose Save, to save the list of commands under a single name.
That single name can subsequently be used, for example in an after-job routine ExecTCL or called from DSExecute, to delete all the hashed files.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.