Hash Partitioning on columns with same values

Madhu1981 · Post by **Madhu1981** » Thu Aug 24, 2006 7:36 am

Hi All,

I have configuration file with 4 nodes. I have a job where i need to do the hash partitioning on One column (assume column name as A) and i have million records coming from the source and all values are same for the column A.

When i perform hash partioning will it partition into 4 nodes or all the data will move into one node..
Kindly Clarify me.

thanks in advance

thumsup9 · Post by **thumsup9** » Thu Aug 24, 2006 8:10 am

Just copied it from dsx pdf...

Although the data is distributed across partitions, the hash partitioner ensures that records with identical keys are in the same partition, allowing duplicates to be found.
Hash partitioning does not necessarily result in an even distribution of data between partitions. For example, if you hash partition a data set based on a zip code field, where a large percentage of your records are from one or two zip codes, you can end up with a few partitions containing most of your records. This behavior can lead to bottlenecks because some nodes are required to process more records than other nodes.

kcbland · Post by **kcbland** » Thu Aug 24, 2006 8:13 am

All rows go to one node. Hash means same values stay together on a node.

ray.wurlod · Post by **ray.wurlod** » Fri Aug 25, 2006 1:45 am

The reason for that is that every "A" will generate the same hashvalue.

It's the same as in SQL - if you group by a column that contains only one distinct value you will end up with one group.

Prefer Round Robin or Random, or partition on a different key.

kumar_s · Post by **kumar_s** » Fri Aug 25, 2006 1:50 am

You need to decide the partiton based on your requirement. If you need to a grouping function like, aggregation(count of records)... you have to follow the grouping partition (hash) else you can proceed with what has been suggested.
For grouping in your case you can even go for sequential mode