Sort + Partition

karthi_gana · Post by **karthi_gana** » Sun Mar 25, 2012 2:14 am

All,

I have read about Sort stage on "Parallel Developers guide". I didn't see which partition will be take by default by the datastage.

may be "Hash". (if it is a varchar and more than one column)

or

may be "Module". (if it is a numeric and only one column)

which one is best partition method to sort a billion of records? I know sort tage itself create performance bottle neck. But there is a need to sort the record (as the source is a sequential file) before processing them.

what is the defauly partition algorithm taken by datastage for sort stage?

Now...I am not using sort stage to sort the data...

A) I have used "Hash" partition with "Perform Sort". How SORT operation will perform?

B) I have used "Modulus" partition with "Perform Sort" . How SORT operation will perform?

C) I have used "Range" partition with "Perform Sort" . How SORT operation will perform?

D) I have used "DB2" partition with "Perform Sort" . How SORT operation will perform?

ray.wurlod · Post by **ray.wurlod** » Sun Mar 25, 2012 3:51 am

Auto will cause Hash to be chosen. Search for other posts on this topic.

Sort will always separately sort the records on the individual nodes (partitions).