Hi,
I want to understand how hash partition works, Here is what I am confused with, If i use hash partition, records will be partitioned based on the hash key provided. records with similar hash values will go to one partition, lets say I have 20 thousand partitions, how they are assigned to each node, lets say I am running on 4 node configuration file.
(Correct me If I am wrong with no of partitions being created)
Please share your thoughts on this.
Appreciate your help.
understanding Hash partition
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
This is a simplistic explanation but, I trust, a comprehensible one.
The Hash algorithm adds together the values of all the characters in the key then divides by the number of nodes. The remainder (integer division) is the node number to which the record is directed.
The unstated complexity is a "bit rotate" operation after each character to get a better level of randomness (= evenness of spread) over the available nodes.
The Hash algorithm adds together the values of all the characters in the key then divides by the number of nodes. The remainder (integer division) is the node number to which the record is directed.
The unstated complexity is a "bit rotate" operation after each character to get a better level of randomness (= evenness of spread) over the available nodes.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 150
- Joined: Tue Mar 13, 2007 1:17 am
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Values = the ASCII or Unicode code points.
You will not read anything about bitrotate in any DataStage manual, but should find it in any decent C programming manual.
Really, you don't need to know. All you need to know is that it works, and that the essence of Hash partitioning algorithm is:
You will not read anything about bitrotate in any DataStage manual, but should find it in any decent C programming manual.
Really, you don't need to know. All you need to know is that it works, and that the essence of Hash partitioning algorithm is:
Code: Select all
hashvalue = f(keyvalue)
nodenumber = Mod(hashvalue,nodecount)
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.