Physical size of dynamic hash file

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
Suttond
Premium Member
Premium Member
Posts: 10
Joined: Wed Apr 09, 2003 11:15 am

Physical size of dynamic hash file

Post by Suttond »

Having written 655,000 rows of a single vchar field of 10bytes to hash file, I believe that the max size would be (10+2)*655,000 = 1.31Mb. The total size of the hash file (data+over) is 21.6MB. Can anyone explain this.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

What was your 'minimum modulus' setting in the stage?
-craig

"You can never have too many knives" -- Logan Nine Fingers
kris007
Charter Member
Charter Member
Posts: 1102
Joined: Tue Jan 24, 2006 5:38 pm
Location: Riverside, RI

Post by kris007 »

Hashed files are BINARY files that pre-allocate space and stores data using a placement algorithm within groups. When groups get full, they spill into overflow space. Dynamic files grow and shuffle the data when they hit predetermined limits.

And hence the answer depends on what you have set the minimum modulus to as Craig has mentioned.
Suttond
Premium Member
Premium Member
Posts: 10
Joined: Wed Apr 09, 2003 11:15 am

Post by Suttond »

the minimum modulus was left at default of 1.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Every record has a minimum 13 byte storage overhead, usually larger.

There are forward and backward pointers, each 32 bits or 64 bits, and another "flag word" of the same size. There is a single byte between the key and data (a "segment mark"), a single byte between each field (a "field mark"), and the whole thing is padded to a multiple of 32 bits or 64 bits.

Every group ("page") is padded out to a multiple of (GROUP.SIZE * 2KB). This probably accounts for most of the "discrepancy" that you reported.

If a record is oversized (larger than specified by LARGE.RECORD) then an additional two pointers are created, and extra pages are created in the OVER.30 file to store that record's data.

If there are overflowed groups, these will also generate extra pages in the OVER.30 file. If the hashed file was created as a UniVerse table, then there will be an extra page (the "SICA block") in OVER.30.

The DATA.30 file will always be (GROUP.SIZE + 1) * current_modulus in size
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply