understanding Hash partition

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply
mekrreddy
Participant
Posts: 88
Joined: Wed Oct 08, 2008 11:12 am

understanding Hash partition

Post by mekrreddy »

Hi,

I want to understand how hash partition works, Here is what I am confused with, If i use hash partition, records will be partitioned based on the hash key provided. records with similar hash values will go to one partition, lets say I have 20 thousand partitions, how they are assigned to each node, lets say I am running on 4 node configuration file.
(Correct me If I am wrong with no of partitions being created)
Please share your thoughts on this.

Appreciate your help.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

This is a simplistic explanation but, I trust, a comprehensible one.

The Hash algorithm adds together the values of all the characters in the key then divides by the number of nodes. The remainder (integer division) is the node number to which the record is directed.

The unstated complexity is a "bit rotate" operation after each character to get a better level of randomness (= evenness of spread) over the available nodes.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
hitmanthesilentassasin
Participant
Posts: 150
Joined: Tue Mar 13, 2007 1:17 am

Post by hitmanthesilentassasin »

Ray - could you please elaborate this definition? and can you specify more about the "the values of all the characters in the key" ? and cant find any other place mentioned about bit rotate. could you please help me understand this more?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Values = the ASCII or Unicode code points.

You will not read anything about bitrotate in any DataStage manual, but should find it in any decent C programming manual.

Really, you don't need to know. All you need to know is that it works, and that the essence of Hash partitioning algorithm is:

Code: Select all

hashvalue = f(keyvalue)
nodenumber = Mod(hashvalue,nodecount)
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply