Hash File

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
bdixon
Participant
Posts: 35
Joined: Thu Nov 20, 2003 5:45 pm
Location: Australia, Sydney

Hash File

Post by bdixon »

How do I know what modulus I should set a dynamic hash file to?
Is there a formula to gain optimal performance?
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

For a full description of what hash files are and how they work, read my lengthy post here:

viewtopic.php?t=85364

Your short answer is that you want to make sure once the data is loaded into the hash file, that no data fell into the overflow file. You want to set the minimum modulus to something like a high watermark. That way, all data being referenced will be found in the efficient data file, without having to spill over into the overflow.

There's a formula, but your head will hurt. Read my above post, it should tell you an easy way to go about setting a minimum size.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
bdixon
Participant
Posts: 35
Joined: Thu Nov 20, 2003 5:45 pm
Location: Australia, Sydney

Post by bdixon »

does the analyze.file command work? Will it help in setting the minimum modulus?
Could someone please explain that to me?
bdixon
Participant
Posts: 35
Joined: Thu Nov 20, 2003 5:45 pm
Location: Australia, Sydney

Post by bdixon »

These are the results I am getting from my hash file....
what does it all mean?
I am really confused about the load factors????

>ANALYZE.FILE DAILY.SAVINGS
File name .................. DAILY.SAVINGS
Pathname ................... DAILY.SAVINGS
File type .................. DYNAMIC
Hashing Algorithm .......... GENERAL
No. of groups (modulus) .... 19412 current ( minimum 19400 )
Large record size .......... 1628 bytes
Group size ................. 2048 bytes
Load factors ............... 80% (split), 50% (merge) and 80% (actual)
Total size ................. 49999872 bytes
Amos.Rosmarin
Premium Member
Premium Member
Posts: 385
Joined: Tue Oct 07, 2003 4:55 am

Post by Amos.Rosmarin »

In your DS installation CD under UTILITIES\Unsupported\HFC you'll find a utility to help you calculate the hash properties.

If you have a specific case of a very slow hash, you can give the file metadata properties and I'm sure people here will help you get to the right parameters .... in every day life just accept the defaults.

HTH,
Amos
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The load factors are two threshholds that govern when the hashed file will grow (split) or shrink (merge).

When the total amount of data in the file, as a proportion of the file's storage capacity, called the current load, exceeds the split load threshhold, the file grows by one Group Size, and one group splits, some of its records being moved to the newly-created group.

When the current load falls below the merge load threshhold, the reverse operation occurs; records from the highest numbered group are merged into a lower numbered group, and the file shrinks by one Group Size, theoretically losing the highest numbered group.

Under some circumstances that group will remain a part of the physical file structure, managed as free space, so as to reduce the overhead of allocating a group buffer when the file next needs to grow.

To get more (and more useful) information from ANALYZE.FILE, specify the STATS option, for example ANALYZE.FILE DAILY.SAVINGS STATS
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Now you know why I stated in my earlier post to just make sure you don't have any data in your overflow. It's a pretty good assumption your dynamic hash file is in an optimal state if there's no data into the overflow. The only worry (minor) you would have is that the file is oversized, wasting a lot of space.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Post Reply