Inline sort or external sort

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
hitmanthesilentassasin
Participant
Posts: 150
Joined: Tue Mar 13, 2007 1:17 am

Inline sort or external sort

Post by hitmanthesilentassasin »

Hi,

I have few below mentioned questions. could any one help me find the answers please?
1. What is the difference between inline sort and external sort?
2. If my data is sorted and then some lookups/aggregations are performed then will my data needs to be sorted once more or will it remain sorted in the parallel environment?

Thanks
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

An inline sort is where you specify sorting on the Partitioning tab of a stage's input link. An external sort is, presumably, where you use an explicit Sort stage. The latter gives greater flexibility, including controlling memory allocated to sorting, generation of key change indicators, and performing sub-sorts (not sorting already sorted/grouped columns).

Lookups should not affect sorted order. Aggregations should not affect sorted order provided that it is the grouping columns that are pre-sorted and you use Sort (not Hash Table) as the aggregation method.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
hitmanthesilentassasin
Participant
Posts: 150
Joined: Tue Mar 13, 2007 1:17 am

Post by hitmanthesilentassasin »

Would that have any of the sort method have any performance issue?
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Yes, it might have a performance impact. As Ray has already pointed out, you can change a number of parameters using an explicit sort; and these changes can impact performance.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Define "performance" in an ETL context.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
just4u_sharath
Premium Member
Premium Member
Posts: 236
Joined: Sun Apr 01, 2007 7:41 am
Location: Michigan

hash table

Post by just4u_sharath »

ray.wurlod wrote:An inline sort is where you specify sorting on the Partitioning tab of a stage's input link. An external sort is, presumably, where you use an explicit Sort stage. The latter gives greater flexibility, including controlling memory allocated to sorting, generation of key change indicators, and performing sub-sorts (not sorting already sorted/grouped columns).

Lookups should not affect sorted order. Aggregations should not affect sorted order provided that it is the grouping columns that are pre-sorted and you use Sort (not Hash Table) as the aggregation method.
What is the difference of using sort and hash in aggregator. i heard if there are more than 1000 rows per mb use sort. But it didnt look logical . please explain in logical terms
just4u_sharath
Premium Member
Premium Member
Posts: 236
Joined: Sun Apr 01, 2007 7:41 am
Location: Michigan

hash table

Post by just4u_sharath »

ray.wurlod wrote:An inline sort is where you specify sorting on the Partitioning tab of a stage's input link. An external sort is, presumably, where you use an explicit Sort stage. The latter gives greater flexibility, including controlling memory allocated to sorting, generation of key change indicators, and performing sub-sorts (not sorting already sorted/grouped columns).

Lookups should not affect sorted order. Aggregations should not affect sorted order provided that it is the grouping columns that are pre-sorted and you use Sort (not Hash Table) as the aggregation method.
What is the difference of using sort and hash in aggregator. i heard if there are more than 1000 rows per mb use sort. But it didnt look logical . please explain in logical terms
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

It's a "rule of thumb" rather than a hard and fast rule. The hash table, in which every distinct combination of key values plus all calculated/counted output columns are stored, must fit in physical memory. If they won't, use Sort. If you're not sure, use Sort. With Sort, memory can be freed (and one or more rows output) every time one of the grouping fields changes value.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply