Comparision of Sort Stage and Order by clause

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Roopanwita
Participant
Posts: 125
Joined: Mon Sep 11, 2006 4:22 am
Location: India

Comparision of Sort Stage and Order by clause

Post by Roopanwita »

Hi,
Can any one tell me why Order by clause gives better performance than Sort Stage of DataSatge. Basically how Sort Stage of DS works.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Define "performance" in an ETL context.

The ORDER BY clause is possibly helped by an index. B-tree indexes are already stored in sorted order.

DataStage has no such help.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Roopanwita
Participant
Posts: 125
Joined: Mon Sep 11, 2006 4:22 am
Location: India

Performance

Post by Roopanwita »

So having an index will improve performance. So SQL order by clause should give better performance than DS sort Stage. But in real scenario, in one of my job DS sort stage is performing better than Order by clause.So I just want to know how DS sort stage work internally(i.e. internanal processes involved in sorting)
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

That suggests that the ORDER BY column is not indexes.

No information is published on the sorting algorithm used by DataStage other than "it's faster than UNIX sort command".

I suspect it's a multi-threaded heap-merge sort, but can offer no proof.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply