cleaning up project

sdfasih · Post by **sdfasih** » Tue Jul 11, 2006 10:20 am

Hi,
I need to clean up my project directory which is occupying 4.5 GB out of 5 GB space allocated to it.Is there any way to do it using DataStage or outside of it what files should i delete.
thanx.

Krazykoolrohit · Post by **Krazykoolrohit** » Tue Jul 11, 2006 10:23 am

Delete all the datasets. You can get the path where the datasets are stored from your config file.If a job aborts, datasets dont get deleted and stay there.

Apart from that there can be some irrelevant project archives which you can look to delete.

kcbland · Post by **kcbland** » Tue Jul 11, 2006 10:34 am

Since you're Server, you don't have PX datasets. You will want to purge job logs and remove errant/retire hashed files and sequential files created within your project.

sdfasih · Post by **sdfasih** » Tue Jul 11, 2006 10:35 am

Krazykoolrohit wrote:Delete all the datasets. You can get the path where the datasets are stored from your config file.If a job aborts, datasets dont get deleted and stay there.

Apart from that there can be some irrelevant project archives which you can look to delete.

rohit I am working on server edition.

Krazykoolrohit · Post by **Krazykoolrohit** » Tue Jul 11, 2006 11:31 am

ya my mistake.

You probably need to clear all your archive. there is nothing else hidden in datastage that can increase your project size.

DSguru2B · Post by **DSguru2B** » Tue Jul 11, 2006 11:44 am

If you have access to ADN, there is a job posted there (.dsx file) which clears all the log files. Very useful and handy, you might want to look it up.

kduke · Post by **kduke** » Tue Jul 11, 2006 8:07 pm

I posted a script viewtopic.php?t=95242 which will find all files greater than 500mb. This is a good starting place.

SettValleyConsulting · Wed Jul 12, 2006 5:36 am

Consider periodically clearing down the phantom directory. Phantom is the Datastage term for a subprocess ... eg the process spawned by a routine called by another routine or by a job. The output from these phantoms is written to the phantom directory named &PH& beneath the datastage project directory rather than the datastage job log. If there is a problem with the phantom, the output is displayed on the director log when the job is reset or rerun under the heading ... ' from previous run'.

The majority of phantom files are small - less than 100 bytes - so you are unlikely to reclaim much disk space, but a large number of small files in a directory can slow unix down ....

###############################################################################
#
# Directory Housekeeping
#
# Purpose: Remove old files from a phantom folder.
# Pass the name of the project folder and the number of days to retain
# Files more than <days> old in the &PH& folder will be deleted.
#
# Arguments 1. Project directory name
# 2 Number of Days to retain files
#
#
# Modification History
#
# Date Author Change
# -----------------------------------------------------------------------------
#
#
###############################################################################

#!/bin/sh

# Parameter Check

if [ $# -ne 2 ]
then
echo "Usage - $0 <PROJ_DIR> <DAYS>"
exit 1
fi

echo
echo "Removing files from phantom directory $1/'&PH&' more than $2 days old ..."
echo

# Make the phantom directory the current dir

cd $1/'&PH&'

# Call find to identify files more than <days> old and remove them

find . -type f -mtime +$2 -exec rm -f {} \;

Retcode=$?

#Restore previous current directory and exit

cd -

exit Retcode

chulett · Post by **chulett** » Wed Jul 12, 2006 6:21 am

True... that *is* a good practice, but as you noted not likely to reclaim much disk space. Ken has noted the Prime Suspects - huge job logs and errant crap that people have 'accidentally' put in the project. Start with the logs. Use Kim's script to help you find the biggens.

kduke · Post by **kduke** » Wed Jul 12, 2006 3:08 pm

Ray, biggens is NOT Texan. Bigguns is.

chulett · Post by **chulett** » Wed Jul 12, 2006 3:12 pm

Ray? And it was more of a 'Married with Children' reference rather than a Texian one.

kduke · Post by **kduke** » Wed Jul 12, 2006 3:28 pm

Texian?

newtier · Post by **newtier** » Fri Jul 14, 2006 12:20 pm

I would bet the script identifies alot of job logs that are huge. You can tie the job number of the hash file name to an actual job name using some DataStage functions.

How many jobs to you have in your project(s)?

Some good practices:

1) Set your jobs (in Director | Job | Clear log) to each have an Auto-Purge enabled, and set a value of 1 or 2 (IBM support may be willing to provide a script, if requested - better it come from them than others, as it can be set to modify Universe values for all or "specified" jobs)

2) NEVER allow developers to target the shared (DataStage) file system for application files (sequential, hash, etc.) Always require they explicitly specify a path to a different file system for application data

kduke · Post by **kduke** » Fri Jul 14, 2006 11:58 pm

By the way 5GB is a joke. By a disk drive. They are cheap.