cleaning up project
Moderators: chulett, rschirm, roy
cleaning up project
Hi,
I need to clean up my project directory which is occupying 4.5 GB out of 5 GB space allocated to it.Is there any way to do it using DataStage or outside of it what files should i delete.
thanx.
I need to clean up my project directory which is occupying 4.5 GB out of 5 GB space allocated to it.Is there any way to do it using DataStage or outside of it what files should i delete.
thanx.
-
- Charter Member
- Posts: 560
- Joined: Wed Jul 13, 2005 5:36 am
- Location: Ohio
Since you're Server, you don't have PX datasets. You will want to purge job logs and remove errant/retire hashed files and sequential files created within your project.
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
rohit I am working on server edition.Krazykoolrohit wrote:Delete all the datasets. You can get the path where the datasets are stored from your config file.If a job aborts, datasets dont get deleted and stay there.
Apart from that there can be some irrelevant project archives which you can look to delete.
-
- Charter Member
- Posts: 560
- Joined: Wed Jul 13, 2005 5:36 am
- Location: Ohio
I posted a script viewtopic.php?t=95242 which will find all files greater than 500mb. This is a good starting place.
Mamu Kim
-
- Premium Member
- Posts: 72
- Joined: Thu Sep 04, 2003 5:01 am
- Location: UK & Europe
Consider periodically clearing down the phantom directory. Phantom is the Datastage term for a subprocess ... eg the process spawned by a routine called by another routine or by a job. The output from these phantoms is written to the phantom directory named &PH& beneath the datastage project directory rather than the datastage job log. If there is a problem with the phantom, the output is displayed on the director log when the job is reset or rerun under the heading ... ' from previous run'.
The majority of phantom files are small - less than 100 bytes - so you are unlikely to reclaim much disk space, but a large number of small files in a directory can slow unix down ....
###############################################################################
#
# Directory Housekeeping
#
# Purpose: Remove old files from a phantom folder.
# Pass the name of the project folder and the number of days to retain
# Files more than <days> old in the &PH& folder will be deleted.
#
# Arguments 1. Project directory name
# 2 Number of Days to retain files
#
#
# Modification History
#
# Date Author Change
# -----------------------------------------------------------------------------
#
#
###############################################################################
#!/bin/sh
# Parameter Check
if [ $# -ne 2 ]
then
echo "Usage - $0 <PROJ_DIR> <DAYS>"
exit 1
fi
echo
echo "Removing files from phantom directory $1/'&PH&' more than $2 days old ..."
echo
# Make the phantom directory the current dir
cd $1/'&PH&'
# Call find to identify files more than <days> old and remove them
find . -type f -mtime +$2 -exec rm -f {} \;
Retcode=$?
#Restore previous current directory and exit
cd -
exit Retcode
The majority of phantom files are small - less than 100 bytes - so you are unlikely to reclaim much disk space, but a large number of small files in a directory can slow unix down ....
###############################################################################
#
# Directory Housekeeping
#
# Purpose: Remove old files from a phantom folder.
# Pass the name of the project folder and the number of days to retain
# Files more than <days> old in the &PH& folder will be deleted.
#
# Arguments 1. Project directory name
# 2 Number of Days to retain files
#
#
# Modification History
#
# Date Author Change
# -----------------------------------------------------------------------------
#
#
###############################################################################
#!/bin/sh
# Parameter Check
if [ $# -ne 2 ]
then
echo "Usage - $0 <PROJ_DIR> <DAYS>"
exit 1
fi
echo
echo "Removing files from phantom directory $1/'&PH&' more than $2 days old ..."
echo
# Make the phantom directory the current dir
cd $1/'&PH&'
# Call find to identify files more than <days> old and remove them
find . -type f -mtime +$2 -exec rm -f {} \;
Retcode=$?
#Restore previous current directory and exit
cd -
exit Retcode
True... that *is* a good practice, but as you noted not likely to reclaim much disk space. Ken has noted the Prime Suspects - huge job logs and errant crap that people have 'accidentally' put in the project. Start with the logs. Use Kim's script to help you find the biggens.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
I would bet the script identifies alot of job logs that are huge. You can tie the job number of the hash file name to an actual job name using some DataStage functions.
How many jobs to you have in your project(s)?
Some good practices:
1) Set your jobs (in Director | Job | Clear log) to each have an Auto-Purge enabled, and set a value of 1 or 2 (IBM support may be willing to provide a script, if requested - better it come from them than others, as it can be set to modify Universe values for all or "specified" jobs)
2) NEVER allow developers to target the shared (DataStage) file system for application files (sequential, hash, etc.) Always require they explicitly specify a path to a different file system for application data
How many jobs to you have in your project(s)?
Some good practices:
1) Set your jobs (in Director | Job | Clear log) to each have an Auto-Purge enabled, and set a value of 1 or 2 (IBM support may be willing to provide a script, if requested - better it come from them than others, as it can be set to modify Universe values for all or "specified" jobs)
2) NEVER allow developers to target the shared (DataStage) file system for application files (sequential, hash, etc.) Always require they explicitly specify a path to a different file system for application data