Page 1 of 1

Explicit Sort Stage Vs TSort Operator ?

Posted: Thu Jan 17, 2013 6:22 pm
by kaps
Hi

I am joining two data sets using join stage and both of them are hash partitioned on the join key but the data sets are not sorted. I believe parallel framework inserts the tsort operator if the data is not sorted.

I see in some of the posts that it's better to put the sort stage explicitly but am not sure about the reason. To me, explicit sort stage or tsort operator both going to sort in the same way. Correct me If I am wrong...

Thanks

Posted: Thu Jan 17, 2013 7:59 pm
by ray.wurlod
All three methods use the same tsort operator.

By using an explicit Sort stage you get more control over the amount of memory allocated for sorting, and you can generate Key Change columns if that's important to your processing.

You also get the ability to handle already-sorted data ("don't sort (previously sorted)" for example).

Posted: Tue Jan 22, 2013 1:39 pm
by kaps
Thanks Ray...So, I don't have to put a sort stage before join stage and sort the key field If I don't have to worry about allocating memory or anything as DatsStage is going to do that. Correct ?

Posted: Tue Jan 22, 2013 2:33 pm
by ray.wurlod
You don't have to, but I include it amongst my "best practices" to do so, especially where the early join keys are already sorted.