Datastage Hashed File Calculator

venugopal81 · Post by **venugopal81** » Tue May 03, 2005 7:45 am

Hi All,

What is the purpose of Datastage Hashed File Calculator.

Please provide me detail information about DHFC.

thanks & regards
venu

ArndW · Post by **ArndW** » Tue May 03, 2005 7:51 am

Venu,

There are several different types of hashed files, the two main categories are dynamic and static. Dynamic files, as the name implies, dynamically and automatically recompute the hashing algorithm and number of buckets as a file grows and shrinks. Static files are, as the name implies, not changed dependant upon data volumes. If you have a file that grows from 0 to many records it can be relatively inefficient to use dynamic hash files - especially if you know beforehand approximately what volumnes you will have. Static hashed files are better in this case; but if you dimension them too small then they will have a lot of inefficient overflows.

The tool will let you calculate hash sizes and dimensions. It is not useful or necessary in most cases - playing around and changing files from dynamic (which is the default) to other values can result in huge performance losses while the gains of tuning files are not that great in most cases.

kduke · Post by **kduke** » Tue May 03, 2005 7:51 am

If you wish to use static hash files and not dynamic then it will help you select a modulo. I never use static hash files not worth the effort. Someone needs to maintain them and understand them long after I am gone from the project. Even worse is they can exist and be sized poorly and killing your performance and nobody knows they are there.

Static hash files cause more problems than they solve.

kduke · Post by **kduke** » Tue May 03, 2005 7:52 am

Arnd beat me to the return. Too slow now a days.

Sunshine2323 · Post by **Sunshine2323** » Tue May 03, 2005 7:53 am

Hi,

This is an unsupported utility shipped with the cd.
Helps in deciding the modulus for the HASH FILE depending on the record size, number of records and the key pattern.

Do a search on the forum for more insights on the same.

venugopal81 · Post by **venugopal81** » Tue May 03, 2005 8:10 am

Arndw,

Which hashed file good performance.
Static/Dynamic.

There is two dynamic hashed files.
GENERAL or SEQ.NUM hashing algorithms.
can you differentiate these two.

thanks
venu

ArndW wrote:Venu,

There are several different types of hashed files, the two main categories are dynamic and static. Dynamic files, as the name implies, dynamically and automatically recompute the hashing algorithm and number of buckets as a file grows and shrinks. Static files are, as the name implies, not changed dependant upon data volumes. If you have a file that grows from 0 to many records it can be relatively inefficient to use dynamic hash files - especially if you know beforehand approximately what volumnes you will have. Static hashed files are better in this case; but if you dimension them too small then they will have a lot of inefficient overflows.

The tool will let you calculate hash sizes and dimensions. It is not useful or necessary in most cases - playing around and changing files from dynamic (which is the default) to other values can result in huge performance losses while the gains of tuning files are not that great in most cases.

ray.wurlod · Post by **ray.wurlod** » Wed May 04, 2005 2:28 am

You will have to specify what you mean be "performance" before that question can be answered!

Sunshine2323 · Post by **Sunshine2323** » Wed May 04, 2005 2:37 am

There is very good ppt on ADN called Hash File Tips and Tricks which answers your questions.

ArndW · Post by **ArndW** » Wed May 04, 2005 2:40 am

Venu,

Ray is absolutely right in asking you to define "performance". In terms of cars, my little sports car performs well for me; but doesn't perform when I take my 8 kids to soccer practice. The Van performs well there, but neither does a good job when I go off-roading...

In the realm of hash files the key concept is "distribution". The more evenly your keys are distributed the more efficient access times are. If your keys are sequential numeric or kind-of-sequential-but-text (i.e. AAA, AAB, AAC, AAD and so on) a SEQ.NUM algorithm might be better; if they are not more unique in their rightmost bytes then GENERAL might be better. Notice I am being very vague, this is intentional as each case is different.

For me the bottom line is that gains made by playing with hash file configurations are usually less than potential losses by not being careful. UniVerse and Pick databases have been around longer than our current crop of 3NF systems and the years of experience in setting defaults make it relatively safe to stay with the system's recommended default values.

The only exception that I continually see is when a DS job clears & re-writes a large Hash file. In this case you can fill the file, see what the modulus is and then set the minimum.modulus to this value to avoid the overhead for growing/shrinking the file.

The SEQ.NUM and GENERAL are just two different algorithms. If your key tends to be a number and sequential you would get a better distribution

DSXchange