I am processing 1 million records on 4 nodes.In tha parallel job i have join stage so i am doing partion on 4 key columns.Data is not evenly distributing on four nodes becuase some groups have more records and some groups have very less records.
In this case I used range partitioning and it is evenly distributing the records across all nodes.
Is there any disadvantage of using range partiitoning.Can any one please tell what are the advantages and disadvantages of range partitioning over hash in this scenario.Which is best partitioning method in my scenario.
Thanks
dstest
Range Partitioning Vs Hash
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
The only disadvantage is the need to preprocess your data to write the range map used by the partitioning algorithm. Along with this goes the need for a standard naming convention for your range maps so that the correct range map is associated with particular sets of data and the range map for one job is not simultaneously being destroyed by a concurrently running job.
There are special settings for Funnel and Collectors that work best with range-partitioned data.
There are special settings for Funnel and Collectors that work best with range-partitioned data.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Safest is to pre-process all the data every time. Think about it.
- You don't process the same data in production that you do in development.
You rarely re-process the same set of data.
If the data happen to be sorted and your sample is "the first n% of rows" your range map will be badly wrong.
If your sample is of the form "random n% of rows" you are processing all the rows anyway.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.