Page 1 of 1

Sort + Partition

Posted: Sun Mar 25, 2012 2:14 am
by karthi_gana
All,

I have read about Sort stage on "Parallel Developers guide". I didn't see which partition will be take by default by the datastage.

may be "Hash". (if it is a varchar and more than one column)

or

may be "Module". (if it is a numeric and only one column)


which one is best partition method to sort a billion of records? I know sort tage itself create performance bottle neck. But there is a need to sort the record (as the source is a sequential file) before processing them.

what is the defauly partition algorithm taken by datastage for sort stage?


Now...I am not using sort stage to sort the data...

A) I have used "Hash" partition with "Perform Sort". How SORT operation will perform?

B) I have used "Modulus" partition with "Perform Sort" . How SORT operation will perform?

C) I have used "Range" partition with "Perform Sort" . How SORT operation will perform?

D) I have used "DB2" partition with "Perform Sort" . How SORT operation will perform?



Posted: Sun Mar 25, 2012 3:51 am
by ray.wurlod
Auto will cause Hash to be chosen. Search for other posts on this topic.

Sort will always separately sort the records on the individual nodes (partitions).