Say I have three different type of aggregations to be done
Aggr1 = Groupkey1,Groupkey2,Groupkey3,
Aggr2 = Groupkey1,Groupkey2,Groupkey3,Groupkey4
Aggr3 = Groupkey1,Groupkey2,Groupkey3,Groupkey4,Groupkey5
Option 1:
Sort1 [GK1,GK2,GK3], Aggregate1 using GK1-3
Sort2 [GK1,GK2,GK3,GK4], Aggregate2 using GK1-4
Sort3 [GK1,GK2,GK3,GK4,GK5], Aggregate3 using GK1-5
Option 2:
Sort1 [GK1,GK2,GK3]--->Agg1
................................--->Sort2[GK4]-->Agg2
......................................................-->Sort3[GK5]-->Agg3
I guess Option 2 makes more sense for performance. What do you guys suggest??
Sort and Aggregation Options
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Or, if there's some reason not to sort by GK4 and GK5 up front (for example you may not have them until you've performed a lookup or join) then Option 2 is definitiely to be preferred - but you MUST preserve the sorting order of GK1 through GK3. Do this with an explicit Sort stage, with GK1 through GK3 specfied as "Don't sort (already sorted)".
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.