cleaning up project

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
sdfasih
Participant
Posts: 39
Joined: Wed May 24, 2006 7:22 pm

cleaning up project

Post by sdfasih »

Hi,
I need to clean up my project directory which is occupying 4.5 GB out of 5 GB space allocated to it.Is there any way to do it using DataStage or outside of it what files should i delete.
thanx.
Krazykoolrohit
Charter Member
Charter Member
Posts: 560
Joined: Wed Jul 13, 2005 5:36 am
Location: Ohio

Post by Krazykoolrohit »

Delete all the datasets. You can get the path where the datasets are stored from your config file.If a job aborts, datasets dont get deleted and stay there.

Apart from that there can be some irrelevant project archives which you can look to delete.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Since you're Server, you don't have PX datasets. You will want to purge job logs and remove errant/retire hashed files and sequential files created within your project.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
sdfasih
Participant
Posts: 39
Joined: Wed May 24, 2006 7:22 pm

Post by sdfasih »

Krazykoolrohit wrote:Delete all the datasets. You can get the path where the datasets are stored from your config file.If a job aborts, datasets dont get deleted and stay there.

Apart from that there can be some irrelevant project archives which you can look to delete.
rohit I am working on server edition.
Krazykoolrohit
Charter Member
Charter Member
Posts: 560
Joined: Wed Jul 13, 2005 5:36 am
Location: Ohio

Post by Krazykoolrohit »

ya my mistake.

You probably need to clear all your archive. there is nothing else hidden in datastage that can increase your project size.
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

If you have access to ADN, there is a job posted there (.dsx file) which clears all the log files. Very useful and handy, you might want to look it up.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

I posted a script viewtopic.php?t=95242 which will find all files greater than 500mb. This is a good starting place.
Mamu Kim
SettValleyConsulting
Premium Member
Premium Member
Posts: 72
Joined: Thu Sep 04, 2003 5:01 am
Location: UK & Europe

Post by SettValleyConsulting »

Consider periodically clearing down the phantom directory. Phantom is the Datastage term for a subprocess ... eg the process spawned by a routine called by another routine or by a job. The output from these phantoms is written to the phantom directory named &PH& beneath the datastage project directory rather than the datastage job log. If there is a problem with the phantom, the output is displayed on the director log when the job is reset or rerun under the heading ... ' from previous run'.

The majority of phantom files are small - less than 100 bytes - so you are unlikely to reclaim much disk space, but a large number of small files in a directory can slow unix down ....

###############################################################################
#
# Directory Housekeeping
#
# Purpose: Remove old files from a phantom folder.
# Pass the name of the project folder and the number of days to retain
# Files more than <days> old in the &PH& folder will be deleted.
#
# Arguments 1. Project directory name
# 2 Number of Days to retain files
#
#
# Modification History
#
# Date Author Change
# -----------------------------------------------------------------------------
#
#
###############################################################################

#!/bin/sh

# Parameter Check

if [ $# -ne 2 ]
then
echo "Usage - $0 <PROJ_DIR> <DAYS>"
exit 1
fi

echo
echo "Removing files from phantom directory $1/'&PH&' more than $2 days old ..."
echo

# Make the phantom directory the current dir

cd $1/'&PH&'

# Call find to identify files more than <days> old and remove them

find . -type f -mtime +$2 -exec rm -f {} \;

Retcode=$?

#Restore previous current directory and exit

cd -

exit Retcode
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

True... that *is* a good practice, but as you noted not likely to reclaim much disk space. Ken has noted the Prime Suspects - huge job logs and errant crap that people have 'accidentally' put in the project. Start with the logs. Use Kim's script to help you find the biggens. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Ray, biggens is NOT Texan. Bigguns is. :wink:
Mamu Kim
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Ray? And it was more of a 'Married with Children' reference rather than a Texian one. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Texian? :P
Mamu Kim
newtier
Premium Member
Premium Member
Posts: 27
Joined: Mon Dec 13, 2004 5:50 pm
Location: St. Louis, MO

Post by newtier »

I would bet the script identifies alot of job logs that are huge. You can tie the job number of the hash file name to an actual job name using some DataStage functions.

How many jobs to you have in your project(s)?

Some good practices:

1) Set your jobs (in Director | Job | Clear log) to each have an Auto-Purge enabled, and set a value of 1 or 2 (IBM support may be willing to provide a script, if requested - better it come from them than others, as it can be set to modify Universe values for all or "specified" jobs)

2) NEVER allow developers to target the shared (DataStage) file system for application files (sequential, hash, etc.) Always require they explicitly specify a path to a different file system for application data
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

By the way 5GB is a joke. By a disk drive. They are cheap.
Mamu Kim
Post Reply