Sort VS. AGGR

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
ds_is_fun
Premium Member
Premium Member
Posts: 194
Joined: Fri Jan 07, 2005 12:00 pm

Sort VS. AGGR

Post by ds_is_fun »

Hi,
I currently have a design where I need to Sort and Aggr.
Do you recomment a Sort in AGGR stage (or) a SORT stage followed by a AGGR stage. If so, why?
Thanks!
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

I would always sort before the aggregator. The sort method to use really does depend upon your incoming data (i.e. coming from a table you can use the Database's mechanism, a flat file might be better sorted outside of DS, etc.).
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

I agree with ArndW. The agg stage is best doing its job alone.
ds_is_fun
Premium Member
Premium Member
Posts: 194
Joined: Fri Jan 07, 2005 12:00 pm

Post by ds_is_fun »

Wouldnt it be better to use the parallelism mechanism on the SORT in DS instead of a flat file sort outside of DS.
My understanding is a sort of DS is not utilizing the PX mechanism. Im assuming partitioning and sorting in the SORT stage would be faster.
Thanks! Pl. reply!
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Do you need to sort the entire DataSet? Or just the data on each partition? In the latter case, the PX sort may well be faster.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply