Hash partitioning and Sorting

Jayakannan · Post by **Jayakannan** » Thu Feb 07, 2013 9:16 am

As per my understanding of Hash partitioning same key values will be partitioned into same processing node.

How Hash partitioning works with/without key values Sorted? Why Sorting is mandatory when the partitioning method is Hash? What happens if the records are Hash partitioned but not Sorted in stages like Join, Remove Duplicate, Change Capture etc.?

zulfi123786 · Post by **zulfi123786** » Thu Feb 07, 2013 9:54 am

Jayakannan wrote:As per my understanding of Hash partitioning same key values will be partitioned into same processing node.

Correct

Jayakannan wrote:How Hash partitioning works with/without key values Sorted?

Hash operator does not require sorted data so either ways the result is same with extra burden of sorting

Jayakannan wrote: Why Sorting is mandatory when the partitioning method is Hash?

Wrong, Its not required

Jayakannan wrote:What happens if the records are Hash partitioned but not Sorted in stages like Join, Remove Duplicate, Change Capture etc.?

You end up with improper data, The stages mandate sorting before processing and if there is no explicit sort tsort operators are placed wherever required (there are cases reported where this has not happened and data was not as expected)

ray.wurlod · Post by **ray.wurlod** » Thu Feb 07, 2013 3:52 pm

Some stages require sorted input because of the way they operate. This is unrelated to the partitioning algorithm used.

If you do not achieve key adjacency using a key-based partitioning algorithm your results can be simply wrong; for example on four nodes summarising by US state, you can end up with as many as 200 groups (4 x 50).

DSXchange

Hash partitioning and Sorting

Hash partitioning and Sorting

Re: Hash partitioning and Sorting