Datastage Hashed File Calculator

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
venugopal81
Participant
Posts: 49
Joined: Sat Mar 26, 2005 12:19 am

Datastage Hashed File Calculator

Post by venugopal81 »

Hi All,


What is the purpose of Datastage Hashed File Calculator.

Please provide me detail information about DHFC.


thanks & regards
venu
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Venu,

There are several different types of hashed files, the two main categories are dynamic and static. Dynamic files, as the name implies, dynamically and automatically recompute the hashing algorithm and number of buckets as a file grows and shrinks. Static files are, as the name implies, not changed dependant upon data volumes. If you have a file that grows from 0 to many records it can be relatively inefficient to use dynamic hash files - especially if you know beforehand approximately what volumnes you will have. Static hashed files are better in this case; but if you dimension them too small then they will have a lot of inefficient overflows.

The tool will let you calculate hash sizes and dimensions. It is not useful or necessary in most cases - playing around and changing files from dynamic (which is the default) to other values can result in huge performance losses while the gains of tuning files are not that great in most cases.
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

If you wish to use static hash files and not dynamic then it will help you select a modulo. I never use static hash files not worth the effort. Someone needs to maintain them and understand them long after I am gone from the project. Even worse is they can exist and be sized poorly and killing your performance and nobody knows they are there.

Static hash files cause more problems than they solve.
Mamu Kim
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Arnd beat me to the return. Too slow now a days.
Mamu Kim
Sunshine2323
Charter Member
Charter Member
Posts: 130
Joined: Mon Sep 06, 2004 3:05 am
Location: Dubai,UAE

Datastage Hashed File Calculator

Post by Sunshine2323 »

Hi,

This is an unsupported utility shipped with the cd.
Helps in deciding the modulus for the HASH FILE depending on the record size, number of records and the key pattern.

Do a search on the forum for more insights on the same.
Warm Regards,
Amruta Bandekar

<b>If A equals success, then the formula is: A = X + Y + Z, X is work. Y is play. Z is keep your mouth shut. </b>
--Albert Einstein
venugopal81
Participant
Posts: 49
Joined: Sat Mar 26, 2005 12:19 am

Which hashed file gives good performance

Post by venugopal81 »

Arndw,

Which hashed file good performance.
Static/Dynamic.

There is two dynamic hashed files.
GENERAL or SEQ.NUM hashing algorithms.
can you differentiate these two.

thanks
venu

ArndW wrote:Venu,

There are several different types of hashed files, the two main categories are dynamic and static. Dynamic files, as the name implies, dynamically and automatically recompute the hashing algorithm and number of buckets as a file grows and shrinks. Static files are, as the name implies, not changed dependant upon data volumes. If you have a file that grows from 0 to many records it can be relatively inefficient to use dynamic hash files - especially if you know beforehand approximately what volumnes you will have. Static hashed files are better in this case; but if you dimension them too small then they will have a lot of inefficient overflows.

The tool will let you calculate hash sizes and dimensions. It is not useful or necessary in most cases - playing around and changing files from dynamic (which is the default) to other values can result in huge performance losses while the gains of tuning files are not that great in most cases.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You will have to specify what you mean be "performance" before that question can be answered!
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Sunshine2323
Charter Member
Charter Member
Posts: 130
Joined: Mon Sep 06, 2004 3:05 am
Location: Dubai,UAE

Datastage Hashed File Calculator

Post by Sunshine2323 »

There is very good ppt on ADN called Hash File Tips and Tricks which answers your questions.
Warm Regards,
Amruta Bandekar

<b>If A equals success, then the formula is: A = X + Y + Z, X is work. Y is play. Z is keep your mouth shut. </b>
--Albert Einstein
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Venu,

Ray is absolutely right in asking you to define "performance". In terms of cars, my little sports car performs well for me; but doesn't perform when I take my 8 kids to soccer practice. The Van performs well there, but neither does a good job when I go off-roading...

In the realm of hash files the key concept is "distribution". The more evenly your keys are distributed the more efficient access times are. If your keys are sequential numeric or kind-of-sequential-but-text (i.e. AAA, AAB, AAC, AAD and so on) a SEQ.NUM algorithm might be better; if they are not more unique in their rightmost bytes then GENERAL might be better. Notice I am being very vague, this is intentional as each case is different.

For me the bottom line is that gains made by playing with hash file configurations are usually less than potential losses by not being careful. UniVerse and Pick databases have been around longer than our current crop of 3NF systems and the years of experience in setting defaults make it relatively safe to stay with the system's recommended default values.

The only exception that I continually see is when a DS job clears & re-writes a large Hash file. In this case you can fill the file, see what the modulus is and then set the minimum.modulus to this value to avoid the overhead for growing/shrinking the file.




The SEQ.NUM and GENERAL are just two different algorithms. If your key tends to be a number and sequential you would get a better distribution
Post Reply