Page 1 of 1

Difference between Explicit sort and Sort on partition

Posted: Thu Jun 29, 2006 8:13 am
by dsdesigner
This is a question that was lingering in my mind for a long time and thought it best to take the opinions from the forum.

This question is related to stages like Join/Aggregator where the data need to be partitioned and sorted. In the chapter pertaining to the Join stage for the parallel job developer guide, for sorting before the join, an explicit Sort stage is used. I have religiously followed the guide in my development and placed an explicit sorted the data using the sort stage before the join. Yet some of the developers I have come accross do not use an explicit sort stage but sort each partition on the join stage itself. My experiments with both the methods (sorting using explicit sort stage and sorting on partition) yield the correct results.

My question is which method is preferred? In other words, which would yield a better performance and why? What is the difference in the two methods?

I would greatly appreciate your thoughts on the above questions.

Thanks in advance.

dsdesigner

Posted: Thu Jun 29, 2006 11:04 am
by kumar_s
Hi dsdesigner ,
The same has been discussed many times earliar.
Explicit sort is always recomended as for as performance is concern. The main reason, would be the buffereing. If the sort need to be done by the link option on any stage like join, the whole two files need to be buffered inorder to get sorterd and later need to be joined. Which wont give you much performace benifit, than a explict sort stage specially designed for handling large voulume of data with use of temporary scratch file.
But all this would be more relevent if you have large amount of data. In case of low volume, it may be ok to have the sort on partition. This is because, by approaching this kind, we avoid unessary stage and its respective memory allocation... etc.

Posted: Thu Jun 29, 2006 11:25 pm
by sanjay
Hi Kumar
need to know sort on partition does n't use temporary scratch file .
only sort stage will use temporary scratch file ??

Sanjay
kumar_s wrote:Hi dsdesigner ,
The same has been discussed many times earliar.
Explicit sort is always recomended as for as performance is concern. The main reason, would be the buffereing. If the sort need to be done by the link option on any stage like join, the whole two files need to be buffered inorder to get sorterd and later need to be joined. Which wont give you much performace benifit, than a explict sort stage specially designed for handling large voulume of data with use of temporary scratch file.
But all this would be more relevent if you have large amount of data. In case of low volume, it may be ok to have the sort on partition. This is because, by approaching this kind, we avoid unessary stage and its respective memory allocation... etc.

Re: Difference between Explicit sort and Sort on partition

Posted: Thu Jun 29, 2006 11:43 pm
by jvr_3jv
Hi Designer,

Inlink sort is always pertaining to the memory allocation to the link and this is relatively less, It can withstand till that memory in the case of heavy volumes of data we have to go for explicit sort. :P