Preload file to memory

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
spracht
Participant
Posts: 105
Joined: Tue Apr 15, 2003 11:30 pm
Location: Germany

Preload file to memory

Post by spracht »

To accelerate lookups on hash files, the option 'Preload file to memory' can be checked in the hash file stage. Some of our lookups are rather small, containing less then 20 records. Is it advisable to preload these files, though they allocate 128 MB (or whatever is adjusted on the system) as the large lookup files do?

Stephan
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Stephan

All of these options are trade offs. It takes time to load to memory. You have less memory for other processes to run in. I think the issue is if you have millions of rows to do lookups against then I would do it. If you have t lots of RAM then I would always do it.

Try it boths ways and let us know how much time you saved.

Kim.

Kim Duke
DwNav - ETL Navigator
www.Duke-Consulting.com
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

DataStage Server Edition only uses the memory necessary to cache the file, with a limit on the size. You see this if you watch memory usage on your job. If you have a file exceed the maximum caching setting, the job sits there for a few minutes while it caches the file, then burps up a message saying the file is too big to cache so it doesn't.

The rule of thumb for read caching is always do the small hash files because it doesn't hurt. The big ones are at your discretion, based on how many times the a row is going to be referenced. It wastes time to read cache a hash file if each row is only going to be read once. You've now doubled the number of times each row is referenced: once to cache, and a second time when referenced by the job.

Kenneth Bland
spracht
Participant
Posts: 105
Joined: Tue Apr 15, 2003 11:30 pm
Location: Germany

Post by spracht »

Ken

thank you very much for that interesting information. I only wonder, why manuals and online help are so perfectly unclear on things like that.

Stephan
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Because manuals show you things like how the tool works. It doesn't go further to show you what makes sense to do and what doesn't.

This is something I constantly try to explain to people. The best tool/database cannot defeat the worst programmer/modeler/consultant. The best programmer/modeler/consultant can defeat the worst tool/database. It's knowing the strengths and weaknesses of what you are working with, and deciding which functionality helps and which hurts. You have to be using diagnostic tools to measure your job's interaction with the operating system. You have to become familiar with Performance Monitor on NT and (glance, top, prstat, etc) on Unix. You have to watch disk i/o, cpu utilization, and carefully design job designs to minimize multiple bottlenecks and "masking" of performance issues.

Kenneth Bland
ariear
Participant
Posts: 237
Joined: Thu Dec 26, 2002 2:19 pm

Post by ariear »

Ken,

I did some observations on W2k server with DS 6 - It seems that when using pre-load into memory DS allocates exactly the RAM it needs (the size of the Hash is already known) and vice-versa when the hash size exceeds the tunable hash allocation parameter DS informs that it will not cache this file because of it's size.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

With only 20 rows (probably only one or two groups), the hashed file is probably going to be resident in memory whether or not you use the pre-load to memory option. I don't believe it would make that much difference in this case.

Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518
Post Reply