Sorting Stage Tunning
Posted: Thu Oct 04, 2012 9:03 pm
Hi all
I have a job which takes 0.3 billion records and do lookup (with hashing) , sorting (with hashing and partition) and aggregation.The job use stable sort and takes 154 mins to finish.
For better performance, I made the following changes to the job
Test 1) Change to use non-stable sort , the job finished with 172 mins
Test 2) Keep using stable sort and Restrict Memory Usage = 60MB, the job finished with 146 min.
Test 3) Keep using stable sort, remove unnecessary hashing before the lookup and set Restrict Memory Usage to
a) 40MB, the job finished with 172 mins
b) 60MB, the job abort due to full scratch disk.
I would like to ask
i) Unstable sort does not do anything good but hurt the performance ?
ii) How to decide the value of "Restrict Memory Usage" ?
iii) More "Restrict Memory Usage" needs more scratch disk ? Shouldn't it only means allocate more memory to the sorting stage ?
I have a job which takes 0.3 billion records and do lookup (with hashing) , sorting (with hashing and partition) and aggregation.The job use stable sort and takes 154 mins to finish.
For better performance, I made the following changes to the job
Test 1) Change to use non-stable sort , the job finished with 172 mins
Test 2) Keep using stable sort and Restrict Memory Usage = 60MB, the job finished with 146 min.
Test 3) Keep using stable sort, remove unnecessary hashing before the lookup and set Restrict Memory Usage to
a) 40MB, the job finished with 172 mins
b) 60MB, the job abort due to full scratch disk.
I would like to ask
i) Unstable sort does not do anything good but hurt the performance ?
ii) How to decide the value of "Restrict Memory Usage" ?
iii) More "Restrict Memory Usage" needs more scratch disk ? Shouldn't it only means allocate more memory to the sorting stage ?