Page 1 of 1

Duplicates in Hash File

Posted: Thu Jul 21, 2005 8:12 am
by I_Server_Whale
Hi All,
How does a hash file handle duplicate records. My understanding is that if a hash file has a key, then a record coming in with the same key overwrites the existing record.
If it doesn't have a key, then it keeps the incoming record even though it is a duplicate.
But does it make sense to store information in a hash file with records that don't have a key(or keys).

Thanks,
Naveen.

Posted: Thu Jul 21, 2005 8:16 am
by chulett
No such thing as duplicates in a hash file. No such thing as a hash file with no keys.

Posted: Thu Jul 21, 2005 9:24 am
by DaleK
It has been my experience that the second record (the duplicate) over writes the first record. So the last record with that key value is kept.

chulett - is this correct?


Dale

Posted: Thu Jul 21, 2005 9:27 am
by chulett
Yes, it's called Destructive Overwrite. Last one in wins.

Re: Duplicates in Hash File

Posted: Fri Jul 22, 2005 6:20 am
by talk2shaanc
naveendronavalli wrote:Hi All,
How does a hash file handle duplicate records. My understanding is that if a hash file has a key, then a record coming in with the same key overwrites the existing record.
If it doesn't have a key, then it keeps the incoming record even though it is a duplicate.
But does it make sense to store information in a hash file with records that don't have a key(or keys).

Thanks,
Naveen.


Your first assumtion is correct. If a record comes and that key is already present in the hash file, the old record gets overwritten.

But your second assumption is wrong.
Few details about Hash file, related to your assumptions :
1. Hash File is nothing but indexed base storage of data.
2. For creating an index you need to have key columns. So for creating hash file through DS, you will have to define at least one key column, otherwise your job will give compilation error.
3. For creating indexes, there are different hashing algorithm.

Posted: Fri Jul 22, 2005 4:55 pm
by ray.wurlod
You have missed the vital point. There is NO index on the key. That's where the speed comes from; the key value is processed by a function (the "hashing algorithm") that returns the exact address of the page on which the record resides. Exactly one logical I/O is required to retrieve the record (unless the record is oversized or its group is overflowed).

Posted: Sun Jul 24, 2005 12:31 pm
by I_Server_Whale
Thanks a million for getting that absolutely straight. Yes! It is always the hashing algorithm. Thanks Ray.

Naveen.