Hi All,
I have configuration file with 4 nodes. I have a job where i need to do the hash partitioning on One column (assume column name as A) and i have million records coming from the source and all values are same for the column A.
When i perform hash partioning will it partition into 4 nodes or all the data will move into one node..
Kindly Clarify me.
thanks in advance
Hash Partitioning on columns with same values
Moderators: chulett, rschirm, roy
Just copied it from dsx pdf...
Although the data is distributed across partitions, the hash partitioner ensures that records with identical keys are in the same partition, allowing duplicates to be found.
Hash partitioning does not necessarily result in an even distribution of data between partitions. For example, if you hash partition a data set based on a zip code field, where a large percentage of your records are from one or two zip codes, you can end up with a few partitions containing most of your records. This behavior can lead to bottlenecks because some nodes are required to process more records than other nodes.
Although the data is distributed across partitions, the hash partitioner ensures that records with identical keys are in the same partition, allowing duplicates to be found.
Hash partitioning does not necessarily result in an even distribution of data between partitions. For example, if you hash partition a data set based on a zip code field, where a large percentage of your records are from one or two zip codes, you can end up with a few partitions containing most of your records. This behavior can lead to bottlenecks because some nodes are required to process more records than other nodes.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
The reason for that is that every "A" will generate the same hashvalue.
It's the same as in SQL - if you group by a column that contains only one distinct value you will end up with one group.
Prefer Round Robin or Random, or partition on a different key.
It's the same as in SQL - if you group by a column that contains only one distinct value you will end up with one group.
Prefer Round Robin or Random, or partition on a different key.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
You need to decide the partiton based on your requirement. If you need to a grouping function like, aggregation(count of records)... you have to follow the grouping partition (hash) else you can proceed with what has been suggested.
For grouping in your case you can even go for sequential mode![Wink :wink:](./images/smilies/icon_wink.gif)
For grouping in your case you can even go for sequential mode
![Wink :wink:](./images/smilies/icon_wink.gif)
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'