ABOUT HASH PERTITIONING

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
mcs_dineshm
Participant
Posts: 12
Joined: Sun Sep 23, 2007 12:21 am
Location: chennai

ABOUT HASH PERTITIONING

Post by mcs_dineshm »

Hi.. can anyone tel me how does hash partitioning works?
dineshm
Maveric
Participant
Posts: 388
Joined: Tue Mar 13, 2007 1:28 am

Post by Maveric »

Heard its based on Hash Algorithm.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

:wink:

Google the term, plenty of sources out there that explain what it is and how it works.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Minhajuddin
Participant
Posts: 467
Joined: Tue Mar 20, 2007 6:36 am
Location: Chennai
Contact:

Post by Minhajuddin »

Go through the Parallel Job Developer's Guide.
Minhajuddin

<a href="http://feeds.feedburner.com/~r/MyExperi ... ~6/2"><img src="http://feeds.feedburner.com/MyExperienc ... lrow.3.gif" alt="My experiences with this DLROW" border="0"></a>
abhi989
Participant
Posts: 28
Joined: Mon Sep 19, 2005 2:31 pm

Post by abhi989 »

Hash is a key based partitioning algorithm. It can be used for any data type for the key value. The bytes (or the characters) making up the key are processed through a function that yields a positive interger called a hash value. This number is divided by the number of partitions and the remainder is the node number(partition) where that key value belongs. So for every distinct key value, all instances will end up in the same partition.

Hope this helps!!
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Hashing is a widely-used technique for reliably (repeatably) selecting one from a finite number of alternatives based on a given value.

In parallel jobs, hash partitioning chooses one from the finite number of processing nodes based on the combination of values provided as "key" on the partitioning tab.

Unless you have the actual partitioning algorithm code (which you don't) you can not predict which node will be chosen for any particular key value, except in extremely simple cases. However, the creators of DataStage are particularly proficient at writing good hashing algorithms that yield reasonably even spread.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply