Page 1 of 1

Sorting Stage Tunning

Posted: Thu Oct 04, 2012 9:03 pm
by XRAY
Hi all

I have a job which takes 0.3 billion records and do lookup (with hashing) , sorting (with hashing and partition) and aggregation.The job use stable sort and takes 154 mins to finish.

For better performance, I made the following changes to the job

Test 1) Change to use non-stable sort , the job finished with 172 mins

Test 2) Keep using stable sort and Restrict Memory Usage = 60MB, the job finished with 146 min.

Test 3) Keep using stable sort, remove unnecessary hashing before the lookup and set Restrict Memory Usage to

a) 40MB, the job finished with 172 mins

b) 60MB, the job abort due to full scratch disk.


I would like to ask

i) Unstable sort does not do anything good but hurt the performance ?
ii) How to decide the value of "Restrict Memory Usage" ?

iii) More "Restrict Memory Usage" needs more scratch disk ? Shouldn't it only means allocate more memory to the sorting stage ?

Posted: Thu Oct 04, 2012 11:17 pm
by ray.wurlod
Stable sort does require rather more memory. Do you really need a stable sort? That is, is it really part of your requirement that the original order of records is preserved for each value of sort key? In my experience it rarely is. So I'd first look at disabling stable sort.

I'd also look at cleaning up your scratch disk and, perhaps, even increasing the Restrict Memory Usage by the same amount again - that is, to 100MB per node - if you have sufficient free memory.

If all that fails, allocate more scratch disk.