Is sort merge collector optimised for node sorted input ?
Posted: Sun Sep 28, 2014 2:43 pm
Hi,
I was wondering if sort merge collector is optimised for parallel sorted input. One of the developers wanted a total sorted sequential file and to have so, used a sort stage before the sequential file and left the collector in auto mode.
Before flipping the collector to sort merge wanted to know if it would blindly resort the data again or is it intelligent enough to identify that incoming is previously grouped and node sorted. The file size being 100 GB forces me to think on these lines
Interesting fact is that current run file though being 100GB was totally sorted, was expecting Atleast few breaks, mysteries of the auto mode![Smile :)](./images/smilies/icon_smile.gif)
Thanks
I was wondering if sort merge collector is optimised for parallel sorted input. One of the developers wanted a total sorted sequential file and to have so, used a sort stage before the sequential file and left the collector in auto mode.
Before flipping the collector to sort merge wanted to know if it would blindly resort the data again or is it intelligent enough to identify that incoming is previously grouped and node sorted. The file size being 100 GB forces me to think on these lines
Interesting fact is that current run file though being 100GB was totally sorted, was expecting Atleast few breaks, mysteries of the auto mode
![Smile :)](./images/smilies/icon_smile.gif)
Thanks