High disk utilisation in project & DS Engine directory

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
rajngt
Participant
Posts: 32
Joined: Wed Jan 04, 2006 6:22 am

High disk utilisation in project & DS Engine directory

Post by rajngt »

We are facing high disk utilisation in the project directory /etl/IS/Projects and Engine
/opt/IBM/InformationServer/V8.5/

currently there are 28 projects. We are looking into below option to reduce this.
1. Reduce the R/W in project directory. Understand that in project directories due to R/W operation in RT_* directories but would like to know can some of the operation can be moved out of these directories.
2. On the same above, in 8.1 version we do have option of RTLogging and ORLogging to choose where to log. But there is no option available in the 8.5 version, which makes difficult for us to find how to stop the logging into RT_Logs.
3. would like to know the possible reason behind the R/W for DSEngine/PXEngine directory and how this can be reduced?
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

your universe database is stored in your project directories.

If you wish to reduce that IO to that mount... spread your projects accross multiple project paths.

You'll have to create new projects and put them on a new mount.

There is very little you can/should do to minimize the amount of file interaction in the project path.

Helpfull things to do:

Don't use your project path as your data directory. This is the default path where all file stages will drop a file. Please fully qualify a path under some type of workspace mount.

Reduce the amount of text in your logs.
Turn off lots of debugging, increase default settings for stuff like the Teradata progress intervals, etc...

I would not chose logging to a database over the universe. Made that mistake once. ouch
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

All the RT_* entries are necessary. With the exception of RT_LOG* they should not grow very large. Keep RT_LOG* small by autopurging at as short an interval as you can tolerate.

As Paul noted indirectly, projects can be created anywhere on the file system, other than the root directory. So you could try creating projects on other file systems.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

With IS 8.5, the internal method used to write job logs to the metadata repository was changed to avoid the primary issues Paul is probably referring to (very slow Director response times and could even effect job performance). OR logging is now enabled using the Administrator client rather than by editing DSParams. Enabling OR logging does not disable RT logging (you don't have that option), but rather now simply places a copy of the log into the repository.

If you need to keep logs for an extended period of time, either enable OR logging (and ensure your repository database can grow as needed to accommodate the logs) or regularly extract and archive copies of your logs. Job log management is an important aspect of any IS project.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

And also keep your logs small by minimizing the messages that actually get logged by aggressively eliminating all that you can. I'm curious what other kinds of "R/W" you are seeing there? Anything? Hopefully you are not persisting any user data there...
-craig

"You can never have too many knives" -- Logan Nine Fingers
rajngt
Participant
Posts: 32
Joined: Wed Jan 04, 2006 6:22 am

Post by rajngt »

Moving a few projects to different directory is the option which we are evaluating.

The data files are not in the project directory.

Basically we have two types of job in our projects one set runs for every 15 mins and others are overnight.

We are currently set the option to store 4 days of log.

As jwiles stated, the option of archiving the logs, how to do this and how we can retrieve back the logs if needed? Is it in the Is console or will it possible will DS director?
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

There is no built-in program or functionality to archive logfiles. That having been said, the log files are standard hashed files which can easily be read from DataStage server jobs, or you have the ability to output the log files to stdout from your command line and can redirect them to a flat file for archival as well.

But I would recommend taking the approach that Craig has suggested - write as little as possible to the logfiles. Although I've never done so in a production environment, if you are working with parallel jobs you can prevent the message from even being written to the log file using the message handlers.

Usually, though, it is the sequence jobs which generate the most output and the message handling mechanism doesn't work for them.
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

Not knowing your environment... you might also want to control your TEMPDIR and sort space make sure it's not on that device.

Spacing those projects out onto different HW devices would be a good idea. Take note of which projects run in the same timeframe and use that as your distribution strategy.

But if it's truely bogging you down... invest in better hardware.

How many tiers is your setup? What other processes are writing to that device? (not just mount)
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

(Actually there is a built-in - but undocumented - utility for archiving and restoring hashed files - it's the "UniVerse" uvbackup and uvrestore commands.)

There is also a "UniVerse" bulk loader called loadfile, but you would need to save the log in loadfile format. There is a BASIC routine published on DSXchange (eons ago) for doing this.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply