Sort + Partition

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
karthi_gana
Premium Member
Premium Member
Posts: 729
Joined: Tue Apr 28, 2009 10:49 pm

Sort + Partition

Post by karthi_gana »

All,

I have read about Sort stage on "Parallel Developers guide". I didn't see which partition will be take by default by the datastage.

may be "Hash". (if it is a varchar and more than one column)

or

may be "Module". (if it is a numeric and only one column)


which one is best partition method to sort a billion of records? I know sort tage itself create performance bottle neck. But there is a need to sort the record (as the source is a sequential file) before processing them.

what is the defauly partition algorithm taken by datastage for sort stage?


Now...I am not using sort stage to sort the data...

A) I have used "Hash" partition with "Perform Sort". How SORT operation will perform?

B) I have used "Modulus" partition with "Perform Sort" . How SORT operation will perform?

C) I have used "Range" partition with "Perform Sort" . How SORT operation will perform?

D) I have used "DB2" partition with "Perform Sort" . How SORT operation will perform?


Karthik
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Auto will cause Hash to be chosen. Search for other posts on this topic.

Sort will always separately sort the records on the individual nodes (partitions).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply