Questions regarding Hash files and hash file stage
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
The hash value in the partitioning algorithm is divided by the number of nodes and the remainder is the node number to be used. Similarly the hash value in the hashing algorithm (for a hashed file) is divided by the number of groups in the hashed file and the remainder (plus 1) is the group number to be used. This is multiplied by the page size to yield the address of the group in the file.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Found the article I was thinking of, you can find it in the Learning Center here. There's also a post here with some discussion and a link to another product's pdf on their dynamic file implementation, similar enough to be helpful here.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
Looks like I'm still not clear on this.
Could anyone please explain the following?
No. of groups (modulus) .... 112526 current ( minimum 1, 0 empty, 23893 overflowed, 1602 badly )
What does "112526 current" mean? I assume they are the number of groups. Then what do the rest of terms mean and how are they related to "current"? Why don't they add up to 112526? How are the above term related to the files Data.30 and Over.30?
And when I resize the hashed file why does current and minimum become same?
No. of groups (modulus) .... 133963 current ( minimum 133963, 0 empty, 8588 overflowed, 0 badly )
Could anyone please explain the following?
No. of groups (modulus) .... 112526 current ( minimum 1, 0 empty, 23893 overflowed, 1602 badly )
What does "112526 current" mean? I assume they are the number of groups. Then what do the rest of terms mean and how are they related to "current"? Why don't they add up to 112526? How are the above term related to the files Data.30 and Over.30?
And when I resize the hashed file why does current and minimum become same?
No. of groups (modulus) .... 133963 current ( minimum 133963, 0 empty, 8588 overflowed, 0 badly )
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
In a dynamic hashed file the number of groups (also known as modulus) can change second by second as data are added and removed. Hence the term "current" as at the time the utility (possibly ANALYZE.FILE) was run.
The minimum value for number of groups is set by the MINIMUM.MODULUS keyword in CREATE.FILE or RESIZE commands. I don't believe it should automatically be set to current by RESIZE, but don't doubt your results.
A group consists of one page (or "buffer") in the DATA.30 file and zero or more pages in the OVER.30 file, linked by pointers in a special kind of double-linked list that allows for repairs.
A group with zero pages in OVER.30 is said to be well-tuned. These groups account for the arithmetic discrepancy as they're not explicitly reported.
Empty groups suggest that the file is considerably over-sized, and could be made physically smaller.
A group with one page in OVER.30 is said to be overflowed. It will cease being overflowed when that group splits during the regular dynamic file growth.
A group with more than one page in OVER.30 is said to be badly overflowed, and will require two or more cycles of splits in order not to be overflowed.
A perfectly tuned hashed file has no overflowed or badly overflowed or empty groups. This is almost impossible to achieve in practical processing (it's also affected by the hashing algorithm and GROUP.SIZE settings), so we aim to minimize the number of overflowed groups.
The minimum value for number of groups is set by the MINIMUM.MODULUS keyword in CREATE.FILE or RESIZE commands. I don't believe it should automatically be set to current by RESIZE, but don't doubt your results.
A group consists of one page (or "buffer") in the DATA.30 file and zero or more pages in the OVER.30 file, linked by pointers in a special kind of double-linked list that allows for repairs.
A group with zero pages in OVER.30 is said to be well-tuned. These groups account for the arithmetic discrepancy as they're not explicitly reported.
Empty groups suggest that the file is considerably over-sized, and could be made physically smaller.
A group with one page in OVER.30 is said to be overflowed. It will cease being overflowed when that group splits during the regular dynamic file growth.
A group with more than one page in OVER.30 is said to be badly overflowed, and will require two or more cycles of splits in order not to be overflowed.
A perfectly tuned hashed file has no overflowed or badly overflowed or empty groups. This is almost impossible to achieve in practical processing (it's also affected by the hashing algorithm and GROUP.SIZE settings), so we aim to minimize the number of overflowed groups.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact: