Dynamic Hashed File Overflow

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
shaimil
Charter Member
Charter Member
Posts: 37
Joined: Fri Feb 28, 2003 5:37 am
Location: UK

Dynamic Hashed File Overflow

Post by shaimil »

Can someone please clear up the following queries on overflow in a dynamic hashed file for me.

1. When does a record get written to OVER.30 as opposed to the hashed file adding another group and resizing?

2. How are records stored in the OVER.30. Are they in groups or just a single group where all overflow records are kept?

3. How are records larger than the group size or seperation stored in both STATIC and DYNAMIC files?

4. How are large records, as specified by the LARGE RECORD SIZE parameter, in DYNAMIC files stored? Are they kept in overflow?

Thanks in advance
Shay
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Check this out:
viewtopic.php?t=85364
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
shaimil
Charter Member
Charter Member
Posts: 37
Joined: Fri Feb 28, 2003 5:37 am
Location: UK

Post by shaimil »

Thanks for that. It answers my first 2 queries but not the last 2. Any ideas?
3. How are records larger than the group size or seperation stored in both STATIC and DYNAMIC files?

4. How are large records, as specified by the LARGE RECORD SIZE parameter, in DYNAMIC files stored? Are they kept in overflow?
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

If a record is larger than the group size of the hash file then it has to go into a overflow group. I am not sure but I think it has something on the end of each group that tells it the record is continuing. The size of a record is sort of stored by the @ID in the hash file. It functions more as a linked list.

If records are huge then use a type 19 file. A hash file gets extremely slow when overflow records are continuely accessed. LARGE RECORD size can help performance a lot. I would go to a static hash file at this point if it was me.

To figure it out then create a type 18 file with a modulo of 3 and do:

od -doc FILE

at the UNIX level. Keep adding records to it and you can see how over flow works very quickly.

"od" is the UNIX octal dump command.
-d decimal
-o octal
-c characters are displayed.
Mamu Kim
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

When hashed files are empty each group is in one "buffer", the size of which is decreed by the separation parameter (static hashed files) or the GROUP.SIZE parameter (dynamic hashed files).

When a group overflows, a secondary group buffer (the same size) is used. In static hashed files this is appended to the file structure unless there is a free secondary buffer already within the file structure. In dynamic hashed files it is appended to the OVER.30 file (because the number of primary group buffers in DATA.30 may have to change) unless there is a free secondary buffer already within OVER.30.

Large records (larger than buffer size in static hashed files, larger than LARGE.RECORD parameter in dynamic hashed file) have an extended header (two extra 32-bit or 64-bit pointers) to a daisy chain of secondary buffers in which data from the record are stored. The key to the record is always stored in the regular group buffer.

Only that part of a large record actually stored in the regular group buffer contributes to the "actual load" figure used in determining whether the dynamic hashed file needs to split (add a group buffer to DATA.30).

There is no real difference in storage between static and dynamic hashed files. In dynamic hashed files the secondary buffers, whether they are used for overflowed groups, oversized records, SICA block or partfile block, are all kept in OVER.30 because DATA.30 (which only contains primary group buffers) needs to grow and shrink.

I anticipate publishing a paper on hashed files in the short to medium term, provided certain legal obstacles can be overcome.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply