Static Hash Files - Different Types - What's the difference

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
MukundShastri
Premium Member
Premium Member
Posts: 103
Joined: Tue Oct 14, 2003 4:07 am

Static Hash Files - Different Types - What's the difference

Post by MukundShastri »

In Hahs file stage when checked on "Create File" option on "input" tab and clicked on "options" tab, we see various file types . Out of these I presume , Type2(Hashed) to Type18(Hashed) are ths static hash file types.
Can anybody inform what is the difference between these 17 types of Static Hash Files ??

Also There is Type 25(B-Tree) file, which has name suggests is not using Hashing algorithm and do not belong to either static or dynamic hash file types. It is using B-Tree indexing as in Oracle. Please correct my understanding here.


Thanks and Regards

Mukund Shastri
gherbert
Participant
Posts: 9
Joined: Mon Mar 29, 2004 7:58 am
Location: Westboro, MA

Here's a definition of the different types...

Post by gherbert »

A file type is specified as an integer in the range 2 through 18. Each file type defines a particular algorithm applied to the (significant chars of the) primary key of each record to determine the group into which the record/data is written.

Type and Usage:
2 Static Modulo. Hashed. Wholly numeric and significant in the last 8 chars.
3 Static Modulo. Hashed. Like #2 but contains delimiter chars (* - / etc).
4 Static Modulo. Hashed. Wholly alphabetic and significant in the last 5 chars.
5 Static Modulo. Hashed. Like #4 but with the full range of ASCII chars.
6 Static Modulo. Hashed. Wholly numeric and significant in the first 8 chars.
7 Static Modulo. Hashed. Like #6 but contains delimiter chars.
8 Static Modulo. Hashed. Wholly alphabetic and significant in the first 5 chars.
9 Static Modulo. Hashed. ASCII chars and significant in the first 4 chars.
10 Static Modulo. Hashed. Wholly numeric and significant in the last 20 chars.
11 Static Modulo. Hashed. Like #10 but contains delimiter chars.
12 Static Modulo. Hashed. Wholly alphabetic and significant in the last 16 chars.
13 Static Modulo. Hashed. Like #12 but contains delimiter chars.
14 Static Modulo. Hashed. Wholly numeric, all characters significant.
15 Static Modulo. Hashed. Like #14, but contains delimiter chars.
16 Static Modulo. Hashed. Wholly alphabetic, all characters significant.
17 Static Modulo. Hashed. Like #16, but contains ASCII chars.
18 Static Modulo. Hashed. Arbitrary for, all characters significant.

For keys which are always a single range of sequential integers,
type 2 will give you perfect rectangular hashing.

Type 18 is usually the best or second-best or third-best hash type for
a file, and many people just use type 18 for all static modulo files.

There are many factors involved in determining which file type to use, but generally types 2 and 18 yield the best overall distribution of records within the file.

Hope this helps!
gherbert
Participant
Posts: 9
Joined: Mon Mar 29, 2004 7:58 am
Location: Westboro, MA

Post by gherbert »

Forgot about your Type 25 (Btree) file query.

This file type is NOT a Binary Tree as many would believe. Rather, is is an N-way Branch Tree, consisting of internal and terminating nodes. Internal nodes contain up to 383 pointers to other (internal or terminating ) nodes. A terminating node contains up to 128 data records.
Each node is 8K in size. This N-way Branch Tree was chosen for the primary indexing structure as a three tier tree (internal node -> 383 internal nodes -> 14,554 terminal nodes) results in a maximum of 3 disk accesses for up to 18,874,368 data records.

This file does NOT utilize a hashing algorithm, but a linear search. A given node is searched from its first data record entry until a match is made or a value greater than the searched for value is reached. If a pointer to an internal node is matched, then that new node is retrieved and searched. While this search may seem slow, there are optimizations within the structure and the scan code to drastically reduce search time.

Hope this helps.
Post Reply