Page 1 of 1

Hash Partitioning on columns with same values

Posted: Thu Aug 24, 2006 7:36 am
by Madhu1981
Hi All,

I have configuration file with 4 nodes. I have a job where i need to do the hash partitioning on One column (assume column name as A) and i have million records coming from the source and all values are same for the column A.

When i perform hash partioning will it partition into 4 nodes or all the data will move into one node..
Kindly Clarify me.

thanks in advance

Posted: Thu Aug 24, 2006 8:10 am
by thumsup9
Just copied it from dsx pdf...

Although the data is distributed across partitions, the hash partitioner ensures that records with identical keys are in the same partition, allowing duplicates to be found.
Hash partitioning does not necessarily result in an even distribution of data between partitions. For example, if you hash partition a data set based on a zip code field, where a large percentage of your records are from one or two zip codes, you can end up with a few partitions containing most of your records. This behavior can lead to bottlenecks because some nodes are required to process more records than other nodes.

Posted: Thu Aug 24, 2006 8:13 am
by kcbland
All rows go to one node. Hash means same values stay together on a node.

Posted: Fri Aug 25, 2006 1:45 am
by ray.wurlod
The reason for that is that every "A" will generate the same hashvalue.

It's the same as in SQL - if you group by a column that contains only one distinct value you will end up with one group.

Prefer Round Robin or Random, or partition on a different key.

Posted: Fri Aug 25, 2006 1:50 am
by kumar_s
You need to decide the partiton based on your requirement. If you need to a grouping function like, aggregation(count of records)... you have to follow the grouping partition (hash) else you can proceed with what has been suggested.
For grouping in your case you can even go for sequential mode :wink: