ABOUT HASH PERTITIONING

mcs_dineshm · Post by **mcs_dineshm** » Tue Sep 25, 2007 8:36 am

Hi.. can anyone tel me how does hash partitioning works?

Maveric · Post by **Maveric** » Tue Sep 25, 2007 8:39 am

Heard its based on Hash Algorithm.

chulett · Post by **chulett** » Tue Sep 25, 2007 8:42 am

Google the term, plenty of sources out there that explain what it is and how it works.

Minhajuddin · Post by **Minhajuddin** » Tue Sep 25, 2007 8:46 am

Go through the Parallel Job Developer's Guide.

abhi989 · Post by **abhi989** » Tue Sep 25, 2007 1:41 pm

Hash is a key based partitioning algorithm. It can be used for any data type for the key value. The bytes (or the characters) making up the key are processed through a function that yields a positive interger called a hash value. This number is divided by the number of partitions and the remainder is the node number(partition) where that key value belongs. So for every distinct key value, all instances will end up in the same partition.

Hope this helps!!

ray.wurlod · Post by **ray.wurlod** » Tue Sep 25, 2007 4:41 pm

Hashing is a widely-used technique for reliably (repeatably) selecting one from a finite number of alternatives based on a given value.

In parallel jobs, hash partitioning chooses one from the finite number of processing nodes based on the combination of values provided as "key" on the partitioning tab.

Unless you have the actual partitioning algorithm code (which you don't) you can not predict which node will be chosen for any particular key value, except in extremely simple cases. However, the creators of DataStage are particularly proficient at writing good hashing algorithms that yield reasonably even spread.