Hashed File Settings

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
DS_SUPPORT
Premium Member
Premium Member
Posts: 232
Joined: Fri Aug 04, 2006 1:20 am
Location: Bangalore

Hashed File Settings

Post by DS_SUPPORT »

We are working in product development, so we are having some problem with the initial settings of the Hashed File.And we will always use the Dynamic Hashed File only.

Let me take the example of employee table, as the employee count will differ from customer to customer. Let me assume one customer is having 10,000 rows and another customer is having 1,000,000 rows in the employee table.

So now for building the Hashed file which settings i have to choose.In Hashed File Calculator for "the number of records" , whether i have to give 1,000,000 or 10,000. based upon the number of records the MINIMUM MODULUS is changing.

What will be the best settings for me in this case? and let me assume i am building my hashed file based on 1,000,000 records. whether my performance will be less , if we have only 10,000 records in my hashed file or when it exceeds 1,000,000 records.

Please advice
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Define "performance" in this context.

Rows/sec is meaningless. A well-sized hashed file will best handle the volume of data for which it is tuned, or a subset thereof. But it will still efficiently handle larger volumes, by automatically managing its "table space" - hence the name "dynamic".
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
DS_SUPPORT
Premium Member
Premium Member
Posts: 232
Joined: Fri Aug 04, 2006 1:20 am
Location: Bangalore

Post by DS_SUPPORT »

Here Performance is in the sense, "fetching the results from Hashed File" when it is used as a lookup.

So i have to consider the maximum number of rows when creating a hashed file.

And my another doubt is, when the records will move to the overflow file as the dynamic hashed file will extend or shrink based upon the data. So how overflow file is coming into picture here.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The overflow file could be used even if the hashed file contains very few groups. It's not related to the total volume of data - it's related to the evenness of spread among the available groups (pages) within the hashed file structure. The only control you have over that is choice of hashing algorithm, and that's not rocket science - only two choices available.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply