Page 1 of 1

Inline sort or external sort

Posted: Mon Dec 17, 2007 12:26 am
by hitmanthesilentassasin
Hi,

I have few below mentioned questions. could any one help me find the answers please?
1. What is the difference between inline sort and external sort?
2. If my data is sorted and then some lookups/aggregations are performed then will my data needs to be sorted once more or will it remain sorted in the parallel environment?

Thanks

Posted: Mon Dec 17, 2007 1:32 am
by ray.wurlod
An inline sort is where you specify sorting on the Partitioning tab of a stage's input link. An external sort is, presumably, where you use an explicit Sort stage. The latter gives greater flexibility, including controlling memory allocated to sorting, generation of key change indicators, and performing sub-sorts (not sorting already sorted/grouped columns).

Lookups should not affect sorted order. Aggregations should not affect sorted order provided that it is the grouping columns that are pre-sorted and you use Sort (not Hash Table) as the aggregation method.

Posted: Mon Dec 17, 2007 4:10 am
by hitmanthesilentassasin
Would that have any of the sort method have any performance issue?

Posted: Mon Dec 17, 2007 4:27 am
by ArndW
Yes, it might have a performance impact. As Ray has already pointed out, you can change a number of parameters using an explicit sort; and these changes can impact performance.

Posted: Mon Dec 17, 2007 5:36 am
by ray.wurlod
Define "performance" in an ETL context.

hash table

Posted: Mon Jan 14, 2008 8:11 pm
by just4u_sharath
ray.wurlod wrote:An inline sort is where you specify sorting on the Partitioning tab of a stage's input link. An external sort is, presumably, where you use an explicit Sort stage. The latter gives greater flexibility, including controlling memory allocated to sorting, generation of key change indicators, and performing sub-sorts (not sorting already sorted/grouped columns).

Lookups should not affect sorted order. Aggregations should not affect sorted order provided that it is the grouping columns that are pre-sorted and you use Sort (not Hash Table) as the aggregation method.
What is the difference of using sort and hash in aggregator. i heard if there are more than 1000 rows per mb use sort. But it didnt look logical . please explain in logical terms

hash table

Posted: Mon Jan 14, 2008 8:14 pm
by just4u_sharath
ray.wurlod wrote:An inline sort is where you specify sorting on the Partitioning tab of a stage's input link. An external sort is, presumably, where you use an explicit Sort stage. The latter gives greater flexibility, including controlling memory allocated to sorting, generation of key change indicators, and performing sub-sorts (not sorting already sorted/grouped columns).

Lookups should not affect sorted order. Aggregations should not affect sorted order provided that it is the grouping columns that are pre-sorted and you use Sort (not Hash Table) as the aggregation method.
What is the difference of using sort and hash in aggregator. i heard if there are more than 1000 rows per mb use sort. But it didnt look logical . please explain in logical terms

Posted: Mon Jan 14, 2008 9:11 pm
by ray.wurlod
It's a "rule of thumb" rather than a hard and fast rule. The hash table, in which every distinct combination of key values plus all calculated/counted output columns are stored, must fit in physical memory. If they won't, use Sort. If you're not sure, use Sort. With Sort, memory can be freed (and one or more rows output) every time one of the grouping fields changes value.