High disk utilisation in project & DS Engine directory

rajngt · Post by **rajngt** » Sat Jul 28, 2012 2:47 pm

We are facing high disk utilisation in the project directory /etl/IS/Projects and Engine
/opt/IBM/InformationServer/V8.5/

currently there are 28 projects. We are looking into below option to reduce this.
1. Reduce the R/W in project directory. Understand that in project directories due to R/W operation in RT_* directories but would like to know can some of the operation can be moved out of these directories.
2. On the same above, in 8.1 version we do have option of RTLogging and ORLogging to choose where to log. But there is no option available in the 8.5 version, which makes difficult for us to find how to stop the logging into RT_Logs.
3. would like to know the possible reason behind the R/W for DSEngine/PXEngine directory and how this can be reduced?

PaulVL · Post by **PaulVL** » Sat Jul 28, 2012 5:05 pm

your universe database is stored in your project directories.

If you wish to reduce that IO to that mount... spread your projects accross multiple project paths.

You'll have to create new projects and put them on a new mount.

There is very little you can/should do to minimize the amount of file interaction in the project path.

Helpfull things to do:

Don't use your project path as your data directory. This is the default path where all file stages will drop a file. Please fully qualify a path under some type of workspace mount.

Reduce the amount of text in your logs.
Turn off lots of debugging, increase default settings for stuff like the Teradata progress intervals, etc...

I would not chose logging to a database over the universe. Made that mistake once. ouch

ray.wurlod · Post by **ray.wurlod** » Sat Jul 28, 2012 7:55 pm

All the RT_* entries are necessary. With the exception of RT_LOG* they should not grow very large. Keep RT_LOG* small by autopurging at as short an interval as you can tolerate.

As Paul noted indirectly, projects can be created anywhere on the file system, other than the root directory. So you could try creating projects on other file systems.

jwiles · Post by **jwiles** » Sat Jul 28, 2012 8:14 pm

With IS 8.5, the internal method used to write job logs to the metadata repository was changed to avoid the primary issues Paul is probably referring to (very slow Director response times and could even effect job performance). OR logging is now enabled using the Administrator client rather than by editing DSParams. Enabling OR logging does not disable RT logging (you don't have that option), but rather now simply places a copy of the log into the repository.

If you need to keep logs for an extended period of time, either enable OR logging (and ensure your repository database can grow as needed to accommodate the logs) or regularly extract and archive copies of your logs. Job log management is an important aspect of any IS project.

Regards,

chulett · Post by **chulett** » Sat Jul 28, 2012 8:17 pm

And also keep your logs small by minimizing the messages that actually get logged by aggressively eliminating all that you can. I'm curious what other kinds of "R/W" you are seeing there? Anything? Hopefully you are not persisting any user data there...

rajngt · Post by **rajngt** » Sun Jul 29, 2012 4:01 am

Moving a few projects to different directory is the option which we are evaluating.

The data files are not in the project directory.

Basically we have two types of job in our projects one set runs for every 15 mins and others are overnight.

We are currently set the option to store 4 days of log.

As jwiles stated, the option of archiving the logs, how to do this and how we can retrieve back the logs if needed? Is it in the Is console or will it possible will DS director?

ArndW · Post by **ArndW** » Sun Jul 29, 2012 5:50 am

There is no built-in program or functionality to archive logfiles. That having been said, the log files are standard hashed files which can easily be read from DataStage server jobs, or you have the ability to output the log files to stdout from your command line and can redirect them to a flat file for archival as well.

But I would recommend taking the approach that Craig has suggested - write as little as possible to the logfiles. Although I've never done so in a production environment, if you are working with parallel jobs you can prevent the message from even being written to the log file using the message handlers.

Usually, though, it is the sequence jobs which generate the most output and the message handling mechanism doesn't work for them.

PaulVL · Post by **PaulVL** » Sun Jul 29, 2012 10:03 am

Not knowing your environment... you might also want to control your TEMPDIR and sort space make sure it's not on that device.

Spacing those projects out onto different HW devices would be a good idea. Take note of which projects run in the same timeframe and use that as your distribution strategy.

But if it's truely bogging you down... invest in better hardware.

How many tiers is your setup? What other processes are writing to that device? (not just mount)

ray.wurlod · Post by **ray.wurlod** » Sun Jul 29, 2012 4:52 pm

(Actually there is a built-in - but undocumented - utility for archiving and restoring hashed files - it's the "UniVerse" uvbackup and uvrestore commands.)

There is also a "UniVerse" bulk loader called loadfile, but you would need to save the log in loadfile format. There is a BASIC routine published on DSXchange (eons ago) for doing this.