Different types of Hash files

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
Gokul
Participant
Posts: 74
Joined: Wed Feb 23, 2005 10:58 pm
Location: Mumbai

Different types of Hash files

Post by Gokul »

Hi,

I am in search of answers for the following questions:

1. What are the different types of Hash files.
2. Under what conditions we should used the specific type.
3. Performance difference in the types of hash file

Thanks,
Gokul
Viswanath
Participant
Posts: 68
Joined: Tue Jul 08, 2003 10:46 pm

Post by Viswanath »

Hi Gokul,

I havent worked on the Unix version of DS, but I bet you would get loads of stuff on Hash files if you search the forum. I am not sure if there is any difference between Unix and Windows hash files.

Cheers,
Vishy
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Gokul,

a quick search for answers to this question showed more responses than I can page on mby browser; I suggest you look there for some information.

The Hash file types are described in the UniVerse documentation, specifically in the UniVerse System Description 9.6.pdf downloadable from IBM at Universe Documentation
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Except for the pathnames there are no differences between hashed files on UNIX and on Windows.

The internal byte order may differ but that's governed by the type of CPU chip rather than by the operating system. There are, for example, some UNIX variants that run on Intel chips. In any case, the internal byte order is invisible to users.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

1. What are the different types of Hashed files?
Static (the number of groups is pre-set and does not change except through intervention) and dynamic (the number of groups can change dependent on the volume of data stored). For more information, search the forum. There are seventeen "types" of static hashed file, but these simply represent static hashed files with different hashing algorithms.

2. Under what conditions we should used the specific type?
To get started use the default type (dynamic) because it's easier. A perfectly tuned static hashed file will populate faster than a default dynamic hashed file because extra work is needed to "grow" the latter. In use for lookups they should work identically (see below), but we don't live in a perfect world where data are distributed ideally.

3. Performance difference in the types of hashed file.
Define "performance" here. Hashed files work by using the primary key value to calculate the address of the group (page) containing the record. Therefore, in a perfectly tuned hashed file, irrespective of type, a hashed file requires exactly one logical I/O operation to access a record. The different types and configuration settings thereof allow you to get as close as possible to "perfectly tuned" (essentially no overflowed groups).

Static hashed files need more regular maintenance (to resize them correctly) than dynamic hashed files (which resize themselves). Time must be allocated for analysis and implementation.

Note that the name is hashed file, not hash file. This refers to the fact that they use a hashing algorithm to determine the key's location among a finite number of groups.

There is a large body of work published in the past forty years about tuning hashed files; some of it is good, some of it isn't.
Last edited by ray.wurlod on Sun Jul 17, 2005 11:58 pm, edited 1 time in total.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Gokul
Participant
Posts: 74
Joined: Wed Feb 23, 2005 10:58 pm
Location: Mumbai

Post by Gokul »

Thanks Ray.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Just out of curiosity, were these interview questions?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Gokul
Participant
Posts: 74
Joined: Wed Feb 23, 2005 10:58 pm
Location: Mumbai

Post by Gokul »

Ray,
These were not interview questions.
Just trying to get more out of you :wink:


Gokul
Post Reply