Hash partitioning and Sorting

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Jayakannan
Participant
Posts: 73
Joined: Wed Sep 30, 2009 5:20 am

Hash partitioning and Sorting

Post by Jayakannan »

As per my understanding of Hash partitioning same key values will be partitioned into same processing node.

How Hash partitioning works with/without key values Sorted? Why Sorting is mandatory when the partitioning method is Hash? What happens if the records are Hash partitioned but not Sorted in stages like Join, Remove Duplicate, Change Capture etc.?
Regards,
Kannan
zulfi123786
Premium Member
Premium Member
Posts: 730
Joined: Tue Nov 04, 2008 10:14 am
Location: Bangalore

Re: Hash partitioning and Sorting

Post by zulfi123786 »

Jayakannan wrote:As per my understanding of Hash partitioning same key values will be partitioned into same processing node.
Correct
Jayakannan wrote:How Hash partitioning works with/without key values Sorted?
Hash operator does not require sorted data so either ways the result is same with extra burden of sorting
Jayakannan wrote: Why Sorting is mandatory when the partitioning method is Hash?
Wrong, Its not required
Jayakannan wrote:What happens if the records are Hash partitioned but not Sorted in stages like Join, Remove Duplicate, Change Capture etc.?
You end up with improper data, The stages mandate sorting before processing and if there is no explicit sort tsort operators are placed wherever required (there are cases reported where this has not happened and data was not as expected)
- Zulfi
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Some stages require sorted input because of the way they operate. This is unrelated to the partitioning algorithm used.

If you do not achieve key adjacency using a key-based partitioning algorithm your results can be simply wrong; for example on four nodes summarising by US state, you can end up with as many as 200 groups (4 x 50).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply