Page 1 of 1

Hash partitioning and Sorting

Posted: Thu Feb 07, 2013 9:16 am
by Jayakannan
As per my understanding of Hash partitioning same key values will be partitioned into same processing node.

How Hash partitioning works with/without key values Sorted? Why Sorting is mandatory when the partitioning method is Hash? What happens if the records are Hash partitioned but not Sorted in stages like Join, Remove Duplicate, Change Capture etc.?

Re: Hash partitioning and Sorting

Posted: Thu Feb 07, 2013 9:54 am
by zulfi123786
Jayakannan wrote:As per my understanding of Hash partitioning same key values will be partitioned into same processing node.
Correct
Jayakannan wrote:How Hash partitioning works with/without key values Sorted?
Hash operator does not require sorted data so either ways the result is same with extra burden of sorting
Jayakannan wrote: Why Sorting is mandatory when the partitioning method is Hash?
Wrong, Its not required
Jayakannan wrote:What happens if the records are Hash partitioned but not Sorted in stages like Join, Remove Duplicate, Change Capture etc.?
You end up with improper data, The stages mandate sorting before processing and if there is no explicit sort tsort operators are placed wherever required (there are cases reported where this has not happened and data was not as expected)

Posted: Thu Feb 07, 2013 3:52 pm
by ray.wurlod
Some stages require sorted input because of the way they operate. This is unrelated to the partitioning algorithm used.

If you do not achieve key adjacency using a key-based partitioning algorithm your results can be simply wrong; for example on four nodes summarising by US state, you can end up with as many as 200 groups (4 x 50).