understanding Hash partition

mekrreddy · Post by **mekrreddy** » Fri Oct 29, 2010 2:20 pm

Hi,

I want to understand how hash partition works, Here is what I am confused with, If i use hash partition, records will be partitioned based on the hash key provided. records with similar hash values will go to one partition, lets say I have 20 thousand partitions, how they are assigned to each node, lets say I am running on 4 node configuration file.
(Correct me If I am wrong with no of partitions being created)
Please share your thoughts on this.

Appreciate your help.

ray.wurlod · Post by **ray.wurlod** » Fri Oct 29, 2010 5:40 pm

This is a simplistic explanation but, I trust, a comprehensible one.

The Hash algorithm adds together the values of all the characters in the key then divides by the number of nodes. The remainder (integer division) is the node number to which the record is directed.

The unstated complexity is a "bit rotate" operation after each character to get a better level of randomness (= evenness of spread) over the available nodes.

hitmanthesilentassasin · Mon Nov 08, 2010 11:01 am

Ray - could you please elaborate this definition? and can you specify more about the "the values of all the characters in the key" ? and cant find any other place mentioned about bit rotate. could you please help me understand this more?

ray.wurlod · Post by **ray.wurlod** » Mon Nov 08, 2010 4:23 pm

Values = the ASCII or Unicode code points.

You will not read anything about bitrotate in any DataStage manual, but should find it in any decent C programming manual.

Really, you don't need to know. All you need to know is that it works, and that the essence of Hash partitioning algorithm is:

Code: Select all

hashvalue = f(keyvalue)
nodenumber = Mod(hashvalue,nodecount)