Duplicates in Hash File

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
I_Server_Whale
Premium Member
Premium Member
Posts: 1255
Joined: Wed Feb 02, 2005 11:54 am
Location: United States of America

Duplicates in Hash File

Post by I_Server_Whale »

Hi All,
How does a hash file handle duplicate records. My understanding is that if a hash file has a key, then a record coming in with the same key overwrites the existing record.
If it doesn't have a key, then it keeps the incoming record even though it is a duplicate.
But does it make sense to store information in a hash file with records that don't have a key(or keys).

Thanks,
Naveen.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

No such thing as duplicates in a hash file. No such thing as a hash file with no keys.
-craig

"You can never have too many knives" -- Logan Nine Fingers
DaleK
Premium Member
Premium Member
Posts: 68
Joined: Fri Jun 27, 2003 8:33 am
Location: Orlando

Post by DaleK »

It has been my experience that the second record (the duplicate) over writes the first record. So the last record with that key value is kept.

chulett - is this correct?


Dale
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Yes, it's called Destructive Overwrite. Last one in wins.
-craig

"You can never have too many knives" -- Logan Nine Fingers
talk2shaanc
Charter Member
Charter Member
Posts: 199
Joined: Tue Jan 18, 2005 2:50 am
Location: India

Re: Duplicates in Hash File

Post by talk2shaanc »

naveendronavalli wrote:Hi All,
How does a hash file handle duplicate records. My understanding is that if a hash file has a key, then a record coming in with the same key overwrites the existing record.
If it doesn't have a key, then it keeps the incoming record even though it is a duplicate.
But does it make sense to store information in a hash file with records that don't have a key(or keys).

Thanks,
Naveen.


Your first assumtion is correct. If a record comes and that key is already present in the hash file, the old record gets overwritten.

But your second assumption is wrong.
Few details about Hash file, related to your assumptions :
1. Hash File is nothing but indexed base storage of data.
2. For creating an index you need to have key columns. So for creating hash file through DS, you will have to define at least one key column, otherwise your job will give compilation error.
3. For creating indexes, there are different hashing algorithm.
Shantanu Choudhary
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You have missed the vital point. There is NO index on the key. That's where the speed comes from; the key value is processed by a function (the "hashing algorithm") that returns the exact address of the page on which the record resides. Exactly one logical I/O is required to retrieve the record (unless the record is oversized or its group is overflowed).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
I_Server_Whale
Premium Member
Premium Member
Posts: 1255
Joined: Wed Feb 02, 2005 11:54 am
Location: United States of America

Post by I_Server_Whale »

Thanks a million for getting that absolutely straight. Yes! It is always the hashing algorithm. Thanks Ray.

Naveen.
Post Reply