External Sort stage v/s sort option in each stage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
adasgupta123
Participant
Posts: 42
Joined: Fri Oct 20, 2006 1:58 am

External Sort stage v/s sort option in each stage

Post by adasgupta123 »

Hi,

In my jobs there many joins and lookups.I want sort the main and reference data .

For main link the no. of records varies from 200000 to 60000000 for each join.

For reference link no. of records varies from 20000 rows to 20000000.

I want to know which one is preferable external sort stage or the implicit sort option which is associated with every stage from perfofmance point of view?

Regards

Avik
mantums
Participant
Posts: 6
Joined: Wed Aug 30, 2006 10:14 pm
Location: Channai

Re: External Sort stage v/s sort option in each stage

Post by mantums »

If data is huge u have to go for external sort stage because it uses temp. space (scrach disk) for sorting the data.

regards
mahantesh

adasgupta123 wrote:Hi,

In my jobs there many joins and lookups.I want sort the main and reference data .

For main link the no. of records varies from 200000 to 60000000 for each join.

For reference link no. of records varies from 20000 rows to 20000000.

I want to know which one is preferable external sort stage or the implicit sort option which is associated with every stage from perfofmance point of view?

Regards

Avik
aakashahuja
Premium Member
Premium Member
Posts: 210
Joined: Wed Feb 16, 2005 7:17 am

Post by aakashahuja »

Although i am not sure about this but my best guess would be that even if you use an external sort stage, then PX might combine the operators at run time if you are doing nothing else.

So in that case, it would be better NOT to use the external sort stage. But as i said, i might be wrong in my understanding, so please confirm it.

Another way to confirm this would be put a sort stage in the job and then view the DUMP_SCORE. It would tell you if PX is combining the sort operator. If so then go without sort or else otherwise.
Nageshsunkoji
Participant
Posts: 222
Joined: Tue Aug 30, 2005 2:07 am
Location: pune
Contact:

Post by Nageshsunkoji »

In Datastage, we have two options of sorting, One is Link sorting and another is Explicit sort stage.

Both methods will use the same tsort operator that means same scratch disk. So, explicit sort and link sort will utilise the same scratch disk for sorting purpose. Link sort is giving only few optuins to you like Satble and Unique options, where as explicit sort satge providing lot of options like Change key cluster mode and other options like sort method (Datastage,Unix) . Until and Unless you have a seperate requirement to use a explicit sort stage go for that other wise do sort link to reduce the number of operators. But this will vary based on your system setup. Try for link sort first.
NageshSunkoji

If you know anything SHARE it.............
If you Don't know anything LEARN it...............
thebird
Participant
Posts: 254
Joined: Thu Jan 06, 2005 12:11 am
Location: India
Contact:

Post by thebird »

aakashahuja wrote: but my best guess would be that even if you use an external sort stage, then PX might combine the operators at run time if you are doing nothing else.

So in that case, it would be better NOT to use the external sort stage. But as i said, i might be wrong in my understanding, so please confirm it.
Aakash,

The advantage of the Sort stage compared to the in-link sort is in its flexibility and the many options that it provides.
Combination of operators (tsort in this case) should not be a criteria for choosing Sort stage or In-link sort ( i would say for any stage) - becuase you have lots of environment variables that can control/restrict the combination especially for the Sorting operation.

Aneesh
Post Reply