Hash File Issues

snassimr · Post by **snassimr** » Mon Jun 27, 2005 1:01 am

Hi !

I have perfomance question.
It was said that using "Pre-load file to memory" for hash file useful if the operational memory comparable with hash file size . The question is
When the perfomance start decreasing . I f the ratioo is 1:10 , 1:100 ?

Does anybody discover rule of thumb ?

ArndW · Post by **ArndW** » Mon Jun 27, 2005 1:21 am

snassimr,

I am assuming that you are asking what happens when the hash file size is larger than the physical memory and is paged out. Unix is quite efficient with swapping while Windoze tends to be rather bad at it; meaning that I would avoid any planning for using the swapping/paging mechanism with windows. Even with UNIX a page operation is very slow compared to a memory lookup, and with a hash file the likelihood of getting a subsequent looking to be on the same physical page as the previous one(s) is low - so you would be constantly attempting lookups on pages that are not in memory.

The result is that loading a file into virtual memory that you know exceeds your phsysical space is a bad idea. With the intelligence of the pre-fetch mechanisms for disk I/O plus the large amount of cache within most disk drives, controllers, and OS disk buffer you might have as much chance of getting a "hit" using normal disk reads than by using a memory image.

All of this is just general theory and might not apply to what you are doing. If you have a large hash lookup table and your source is sorted in such a way that the key used to lookup into this hash file is usually the same then a memory file would make sense, since the pages in memory are being constantly used and thus not flushed by the OS.

snassimr · Post by **snassimr** » Mon Jun 27, 2005 1:32 am

So the point that the memory size must be more than hash file size ?
It not enouthg low ratio ?

I work on Windows.

"If you have a large hash lookup table and your source is sorted in such a way that the key used to lookup into this hash file is usually the same then a memory file would make sense"

You are talking about low cardinality key ?

ArndW · Post by **ArndW** » Mon Jun 27, 2005 2:28 am

snassimr,

each hash file and lookup combination has it's own special rules, so I tried using just a general explanation.

The cardinality of your key is not important, it is the cardinality of your lookups. The paging mechanism will tend to choose pages to remove from physical memory that have not been accessed recently. If your lookup uses keys that hash to different pages then you might get cases where each and every lookup forces a page to be physically retrieved from paging; if you do 100 consecutive lookups using the same key the chances are very high that this page will remain in memory.

As a general rule for you in Windows - don't use up your physical memory.