Duplicates in Hash File

I_Server_Whale · Post by **I_Server_Whale** » Thu Jul 21, 2005 8:12 am

Hi All,
How does a hash file handle duplicate records. My understanding is that if a hash file has a key, then a record coming in with the same key overwrites the existing record.
If it doesn't have a key, then it keeps the incoming record even though it is a duplicate.
But does it make sense to store information in a hash file with records that don't have a key(or keys).

Thanks,
Naveen.

chulett · Post by **chulett** » Thu Jul 21, 2005 8:16 am

No such thing as duplicates in a hash file. No such thing as a hash file with no keys.

DaleK · Post by **DaleK** » Thu Jul 21, 2005 9:24 am

It has been my experience that the second record (the duplicate) over writes the first record. So the last record with that key value is kept.

chulett - is this correct?

Dale

chulett · Post by **chulett** » Thu Jul 21, 2005 9:27 am

Yes, it's called Destructive Overwrite. Last one in wins.

talk2shaanc · Post by **talk2shaanc** » Fri Jul 22, 2005 6:20 am

naveendronavalli wrote:Hi All,
How does a hash file handle duplicate records. My understanding is that if a hash file has a key, then a record coming in with the same key overwrites the existing record.
If it doesn't have a key, then it keeps the incoming record even though it is a duplicate.
But does it make sense to store information in a hash file with records that don't have a key(or keys).

Thanks,
Naveen.

Your first assumtion is correct. If a record comes and that key is already present in the hash file, the old record gets overwritten.

But your second assumption is wrong.
Few details about Hash file, related to your assumptions :
1. Hash File is nothing but indexed base storage of data.
2. For creating an index you need to have key columns. So for creating hash file through DS, you will have to define at least one key column, otherwise your job will give compilation error.
3. For creating indexes, there are different hashing algorithm.

ray.wurlod · Post by **ray.wurlod** » Fri Jul 22, 2005 4:55 pm

You have missed the vital point. There is NO index on the key. That's where the speed comes from; the key value is processed by a function (the "hashing algorithm") that returns the exact address of the page on which the record resides. Exactly one logical I/O is required to retrieve the record (unless the record is oversized or its group is overflowed).

I_Server_Whale · Post by **I_Server_Whale** » Sun Jul 24, 2005 12:31 pm

Thanks a million for getting that absolutely straight. Yes! It is always the hashing algorithm. Thanks Ray.

Naveen.

DSXchange

Duplicates in Hash File

Duplicates in Hash File

Re: Duplicates in Hash File