Questions regarding Hash files and hash file stage

mavrick21 · Post by **mavrick21** » Sat May 28, 2011 12:57 am

Craig,

Thanks for your quick reply. I was under the assumption that only hash partitioning in parallel jobs works by creating and storing the hash value.

More details would be very helpful.

Thanks,
mav

ray.wurlod · Post by **ray.wurlod** » Sat May 28, 2011 2:10 am

The hash value in the partitioning algorithm is divided by the number of nodes and the remainder is the node number to be used. Similarly the hash value in the hashing algorithm (for a hashed file) is divided by the number of groups in the hashed file and the remainder (plus 1) is the group number to be used. This is multiplied by the page size to yield the address of the group in the file.

chulett · Post by **chulett** » Sat May 28, 2011 7:35 am

Found the article I was thinking of, you can find it in the Learning Center here. There's also a post here with some discussion and a link to another product's pdf on their dynamic file implementation, similar enough to be helpful here.

mavrick21 · Post by **mavrick21** » Sat May 28, 2011 8:35 am

Thanks Craig

mavrick21 · Post by **mavrick21** » Tue Mar 03, 2015 6:40 pm

Looks like I'm still not clear on this.

Could anyone please explain the following?

No. of groups (modulus) .... 112526 current ( minimum 1, 0 empty, 23893 overflowed, 1602 badly )

What does "112526 current" mean? I assume they are the number of groups. Then what do the rest of terms mean and how are they related to "current"? Why don't they add up to 112526? How are the above term related to the files Data.30 and Over.30?

And when I resize the hashed file why does current and minimum become same?
No. of groups (modulus) .... 133963 current ( minimum 133963, 0 empty, 8588 overflowed, 0 badly )

ray.wurlod · Post by **ray.wurlod** » Tue Mar 03, 2015 8:50 pm

In a dynamic hashed file the number of groups (also known as modulus) can change second by second as data are added and removed. Hence the term "current" as at the time the utility (possibly ANALYZE.FILE) was run.

The minimum value for number of groups is set by the MINIMUM.MODULUS keyword in CREATE.FILE or RESIZE commands. I don't believe it should automatically be set to current by RESIZE, but don't doubt your results.

A group consists of one page (or "buffer") in the DATA.30 file and zero or more pages in the OVER.30 file, linked by pointers in a special kind of double-linked list that allows for repairs.

A group with zero pages in OVER.30 is said to be well-tuned. These groups account for the arithmetic discrepancy as they're not explicitly reported.

Empty groups suggest that the file is considerably over-sized, and could be made physically smaller.

A group with one page in OVER.30 is said to be overflowed. It will cease being overflowed when that group splits during the regular dynamic file growth.

A group with more than one page in OVER.30 is said to be badly overflowed, and will require two or more cycles of splits in order not to be overflowed.

A perfectly tuned hashed file has no overflowed or badly overflowed or empty groups. This is almost impossible to achieve in practical processing (it's also affected by the hashing algorithm and GROUP.SIZE settings), so we aim to minimize the number of overflowed groups.

chulett · Post by **chulett** » Wed Mar 04, 2015 8:05 am

And here I was thinking about chiming in on this... thank goodness I didn't have time last night. LOL

ray.wurlod · Post by **ray.wurlod** » Wed Mar 04, 2015 3:45 pm

Maybe mavrick21 will mull over that answer for four more years before coming back with the next question.

mavrick21 · Post by **mavrick21** » Wed Mar 04, 2015 3:48 pm

Thank you!

mavrick21 · Post by **mavrick21** » Wed Mar 04, 2015 6:03 pm

4 years! Didn't realize until you mentioned it.