Hashed File Settings

DS_SUPPORT · Post by **DS_SUPPORT** » Tue Mar 27, 2007 11:02 pm

We are working in product development, so we are having some problem with the initial settings of the Hashed File.And we will always use the Dynamic Hashed File only.

Let me take the example of employee table, as the employee count will differ from customer to customer. Let me assume one customer is having 10,000 rows and another customer is having 1,000,000 rows in the employee table.

So now for building the Hashed file which settings i have to choose.In Hashed File Calculator for "the number of records" , whether i have to give 1,000,000 or 10,000. based upon the number of records the MINIMUM MODULUS is changing.

What will be the best settings for me in this case? and let me assume i am building my hashed file based on 1,000,000 records. whether my performance will be less , if we have only 10,000 records in my hashed file or when it exceeds 1,000,000 records.

Please advice

ray.wurlod · Post by **ray.wurlod** » Wed Mar 28, 2007 6:52 am

Define "performance" in this context.

Rows/sec is meaningless. A well-sized hashed file will best handle the volume of data for which it is tuned, or a subset thereof. But it will still efficiently handle larger volumes, by automatically managing its "table space" - hence the name "dynamic".

DS_SUPPORT · Post by **DS_SUPPORT** » Wed Mar 28, 2007 7:04 am

Here Performance is in the sense, "fetching the results from Hashed File" when it is used as a lookup.

So i have to consider the maximum number of rows when creating a hashed file.

And my another doubt is, when the records will move to the overflow file as the dynamic hashed file will extend or shrink based upon the data. So how overflow file is coming into picture here.

ray.wurlod · Post by **ray.wurlod** » Wed Mar 28, 2007 7:14 am

The overflow file could be used even if the hashed file contains very few groups. It's not related to the total volume of data - it's related to the evenness of spread among the available groups (pages) within the hashed file structure. The only control you have over that is choice of hashing algorithm, and that's not rocket science - only two choices available.