Page 1 of 1

External Sort stage v/s sort option in each stage

Posted: Fri Dec 08, 2006 1:41 am
by adasgupta123
Hi,

In my jobs there many joins and lookups.I want sort the main and reference data .

For main link the no. of records varies from 200000 to 60000000 for each join.

For reference link no. of records varies from 20000 rows to 20000000.

I want to know which one is preferable external sort stage or the implicit sort option which is associated with every stage from perfofmance point of view?

Regards

Avik

Re: External Sort stage v/s sort option in each stage

Posted: Fri Dec 08, 2006 3:03 am
by mantums
If data is huge u have to go for external sort stage because it uses temp. space (scrach disk) for sorting the data.

regards
mahantesh

adasgupta123 wrote:Hi,

In my jobs there many joins and lookups.I want sort the main and reference data .

For main link the no. of records varies from 200000 to 60000000 for each join.

For reference link no. of records varies from 20000 rows to 20000000.

I want to know which one is preferable external sort stage or the implicit sort option which is associated with every stage from perfofmance point of view?

Regards

Avik

Posted: Fri Dec 08, 2006 3:05 am
by aakashahuja
Although i am not sure about this but my best guess would be that even if you use an external sort stage, then PX might combine the operators at run time if you are doing nothing else.

So in that case, it would be better NOT to use the external sort stage. But as i said, i might be wrong in my understanding, so please confirm it.

Another way to confirm this would be put a sort stage in the job and then view the DUMP_SCORE. It would tell you if PX is combining the sort operator. If so then go without sort or else otherwise.

Posted: Fri Dec 08, 2006 3:45 am
by Nageshsunkoji
In Datastage, we have two options of sorting, One is Link sorting and another is Explicit sort stage.

Both methods will use the same tsort operator that means same scratch disk. So, explicit sort and link sort will utilise the same scratch disk for sorting purpose. Link sort is giving only few optuins to you like Satble and Unique options, where as explicit sort satge providing lot of options like Change key cluster mode and other options like sort method (Datastage,Unix) . Until and Unless you have a seperate requirement to use a explicit sort stage go for that other wise do sort link to reduce the number of operators. But this will vary based on your system setup. Try for link sort first.

Posted: Fri Dec 08, 2006 10:44 am
by thebird
aakashahuja wrote: but my best guess would be that even if you use an external sort stage, then PX might combine the operators at run time if you are doing nothing else.

So in that case, it would be better NOT to use the external sort stage. But as i said, i might be wrong in my understanding, so please confirm it.
Aakash,

The advantage of the Sort stage compared to the in-link sort is in its flexibility and the many options that it provides.
Combination of operators (tsort in this case) should not be a criteria for choosing Sort stage or In-link sort ( i would say for any stage) - becuase you have lots of environment variables that can control/restrict the combination especially for the Sorting operation.

Aneesh