Difference between Explicit sort and Sort on partition

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
dsdesigner
Participant
Posts: 34
Joined: Thu Jul 29, 2004 1:03 pm

Difference between Explicit sort and Sort on partition

Post by dsdesigner »

This is a question that was lingering in my mind for a long time and thought it best to take the opinions from the forum.

This question is related to stages like Join/Aggregator where the data need to be partitioned and sorted. In the chapter pertaining to the Join stage for the parallel job developer guide, for sorting before the join, an explicit Sort stage is used. I have religiously followed the guide in my development and placed an explicit sorted the data using the sort stage before the join. Yet some of the developers I have come accross do not use an explicit sort stage but sort each partition on the join stage itself. My experiments with both the methods (sorting using explicit sort stage and sorting on partition) yield the correct results.

My question is which method is preferred? In other words, which would yield a better performance and why? What is the difference in the two methods?

I would greatly appreciate your thoughts on the above questions.

Thanks in advance.

dsdesigner
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Hi dsdesigner ,
The same has been discussed many times earliar.
Explicit sort is always recomended as for as performance is concern. The main reason, would be the buffereing. If the sort need to be done by the link option on any stage like join, the whole two files need to be buffered inorder to get sorterd and later need to be joined. Which wont give you much performace benifit, than a explict sort stage specially designed for handling large voulume of data with use of temporary scratch file.
But all this would be more relevent if you have large amount of data. In case of low volume, it may be ok to have the sort on partition. This is because, by approaching this kind, we avoid unessary stage and its respective memory allocation... etc.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
sanjay
Premium Member
Premium Member
Posts: 203
Joined: Fri Apr 23, 2004 2:22 am

Post by sanjay »

Hi Kumar
need to know sort on partition does n't use temporary scratch file .
only sort stage will use temporary scratch file ??

Sanjay
kumar_s wrote:Hi dsdesigner ,
The same has been discussed many times earliar.
Explicit sort is always recomended as for as performance is concern. The main reason, would be the buffereing. If the sort need to be done by the link option on any stage like join, the whole two files need to be buffered inorder to get sorterd and later need to be joined. Which wont give you much performace benifit, than a explict sort stage specially designed for handling large voulume of data with use of temporary scratch file.
But all this would be more relevent if you have large amount of data. In case of low volume, it may be ok to have the sort on partition. This is because, by approaching this kind, we avoid unessary stage and its respective memory allocation... etc.
jvr_3jv
Participant
Posts: 2
Joined: Tue Jun 20, 2006 9:00 am

Re: Difference between Explicit sort and Sort on partition

Post by jvr_3jv »

Hi Designer,

Inlink sort is always pertaining to the memory allocation to the link and this is relatively less, It can withstand till that memory in the case of heavy volumes of data we have to go for explicit sort. :P
Post Reply