Page 1 of 1

Preload file to memory

Posted: Tue Aug 05, 2003 12:39 am
by spracht
To accelerate lookups on hash files, the option 'Preload file to memory' can be checked in the hash file stage. Some of our lookups are rather small, containing less then 20 records. Is it advisable to preload these files, though they allocate 128 MB (or whatever is adjusted on the system) as the large lookup files do?

Stephan

Posted: Tue Aug 05, 2003 8:13 am
by kduke
Stephan

All of these options are trade offs. It takes time to load to memory. You have less memory for other processes to run in. I think the issue is if you have millions of rows to do lookups against then I would do it. If you have t lots of RAM then I would always do it.

Try it boths ways and let us know how much time you saved.

Kim.

Kim Duke
DwNav - ETL Navigator
www.Duke-Consulting.com

Posted: Tue Aug 05, 2003 9:23 am
by kcbland
DataStage Server Edition only uses the memory necessary to cache the file, with a limit on the size. You see this if you watch memory usage on your job. If you have a file exceed the maximum caching setting, the job sits there for a few minutes while it caches the file, then burps up a message saying the file is too big to cache so it doesn't.

The rule of thumb for read caching is always do the small hash files because it doesn't hurt. The big ones are at your discretion, based on how many times the a row is going to be referenced. It wastes time to read cache a hash file if each row is only going to be read once. You've now doubled the number of times each row is referenced: once to cache, and a second time when referenced by the job.

Kenneth Bland

Posted: Tue Aug 05, 2003 9:35 am
by spracht
Ken

thank you very much for that interesting information. I only wonder, why manuals and online help are so perfectly unclear on things like that.

Stephan

Posted: Tue Aug 05, 2003 10:30 am
by kcbland
Because manuals show you things like how the tool works. It doesn't go further to show you what makes sense to do and what doesn't.

This is something I constantly try to explain to people. The best tool/database cannot defeat the worst programmer/modeler/consultant. The best programmer/modeler/consultant can defeat the worst tool/database. It's knowing the strengths and weaknesses of what you are working with, and deciding which functionality helps and which hurts. You have to be using diagnostic tools to measure your job's interaction with the operating system. You have to become familiar with Performance Monitor on NT and (glance, top, prstat, etc) on Unix. You have to watch disk i/o, cpu utilization, and carefully design job designs to minimize multiple bottlenecks and "masking" of performance issues.

Kenneth Bland

Posted: Tue Aug 05, 2003 12:57 pm
by ariear
Ken,

I did some observations on W2k server with DS 6 - It seems that when using pre-load into memory DS allocates exactly the RAM it needs (the size of the Hash is already known) and vice-versa when the hash size exceeds the tunable hash allocation parameter DS informs that it will not cache this file because of it's size.

Posted: Tue Aug 05, 2003 4:19 pm
by ray.wurlod
With only 20 rows (probably only one or two groups), the hashed file is probably going to be resident in memory whether or not you use the pre-load to memory option. I don't believe it would make that much difference in this case.

Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518