Inline sort or external sort
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 150
- Joined: Tue Mar 13, 2007 1:17 am
Inline sort or external sort
Hi,
I have few below mentioned questions. could any one help me find the answers please?
1. What is the difference between inline sort and external sort?
2. If my data is sorted and then some lookups/aggregations are performed then will my data needs to be sorted once more or will it remain sorted in the parallel environment?
Thanks
I have few below mentioned questions. could any one help me find the answers please?
1. What is the difference between inline sort and external sort?
2. If my data is sorted and then some lookups/aggregations are performed then will my data needs to be sorted once more or will it remain sorted in the parallel environment?
Thanks
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
An inline sort is where you specify sorting on the Partitioning tab of a stage's input link. An external sort is, presumably, where you use an explicit Sort stage. The latter gives greater flexibility, including controlling memory allocated to sorting, generation of key change indicators, and performing sub-sorts (not sorting already sorted/grouped columns).
Lookups should not affect sorted order. Aggregations should not affect sorted order provided that it is the grouping columns that are pre-sorted and you use Sort (not Hash Table) as the aggregation method.
Lookups should not affect sorted order. Aggregations should not affect sorted order provided that it is the grouping columns that are pre-sorted and you use Sort (not Hash Table) as the aggregation method.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 150
- Joined: Tue Mar 13, 2007 1:17 am
Yes, it might have a performance impact. As Ray has already pointed out, you can change a number of parameters using an explicit sort; and these changes can impact performance.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Premium Member
- Posts: 236
- Joined: Sun Apr 01, 2007 7:41 am
- Location: Michigan
hash table
What is the difference of using sort and hash in aggregator. i heard if there are more than 1000 rows per mb use sort. But it didnt look logical . please explain in logical termsray.wurlod wrote:An inline sort is where you specify sorting on the Partitioning tab of a stage's input link. An external sort is, presumably, where you use an explicit Sort stage. The latter gives greater flexibility, including controlling memory allocated to sorting, generation of key change indicators, and performing sub-sorts (not sorting already sorted/grouped columns).
Lookups should not affect sorted order. Aggregations should not affect sorted order provided that it is the grouping columns that are pre-sorted and you use Sort (not Hash Table) as the aggregation method.
-
- Premium Member
- Posts: 236
- Joined: Sun Apr 01, 2007 7:41 am
- Location: Michigan
hash table
What is the difference of using sort and hash in aggregator. i heard if there are more than 1000 rows per mb use sort. But it didnt look logical . please explain in logical termsray.wurlod wrote:An inline sort is where you specify sorting on the Partitioning tab of a stage's input link. An external sort is, presumably, where you use an explicit Sort stage. The latter gives greater flexibility, including controlling memory allocated to sorting, generation of key change indicators, and performing sub-sorts (not sorting already sorted/grouped columns).
Lookups should not affect sorted order. Aggregations should not affect sorted order provided that it is the grouping columns that are pre-sorted and you use Sort (not Hash Table) as the aggregation method.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
It's a "rule of thumb" rather than a hard and fast rule. The hash table, in which every distinct combination of key values plus all calculated/counted output columns are stored, must fit in physical memory. If they won't, use Sort. If you're not sure, use Sort. With Sort, memory can be freed (and one or more rows output) every time one of the grouping fields changes value.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.