Sort options

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

They are not the same. Stable sort can apply even when you are not using either of the "Don't sort" options. Stable sort preserves existing order of rows when the sort keys match. This needs more memory than not doing a stable sort.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
gagan8877
Premium Member
Premium Member
Posts: 77
Joined: Mon Jun 19, 2006 1:30 pm

Therefore,

Post by gagan8877 »

ray.wurlod wrote:They are not the same. Stable sort can apply even when you are not using either of the "Don't sort" options. Stable sort preserves existing order of rows when the sort keys match. This needs more memory than not doing a stable sort.
Thanks Ray.
So as I understand, Don't Sort means - leave the dataset alone, don't do anything (but that contradicts the secondary key sorting scenario - Don't sort by primary key, it is already sorted, sort only by secondary key, if any) and Stable sort means sort the dataset and preserve the order of duplicate keys?

- correct?
Gary
"A journey of a thousand miles, begins with one step"
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Not correct. Stable sort means preserve the ARRIVAL ORDER OF ROWS - nothing to do with keys at all.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
gagan8877
Premium Member
Premium Member
Posts: 77
Joined: Mon Jun 19, 2006 1:30 pm

Post by gagan8877 »

ray.wurlod wrote:Not correct. Stable sort means preserve the ARRIVAL ORDER OF ROWS - nothing to do with keys at all.
My bad - I meant duplicate Key Values.

1. So lets say I have a dataset which is not sorted in the upstream stage and I set Stable Sort in the downstream stage - will be it sorted and the stage will try to preserve the order of similar key values? Y/N

2. A dataset which is sorted in the upstream stage and I set Stable Sort in the downstream stage - will be it try to resort it and the stage will try to preserve the order of similar key values (and thats why it is expensive) or will it not even try?

3. Don't Sort means - leave the dataset alone, don't even try to sort. If the upstream stage has the same primary and secondary keys, then leave it alone - dont sort at all, dont even try. if the upstream stage had only Primary Key and no Secondary keys, then sort the secondary keys - Primary Keys are already sorted? Y/N - please explain if No.

Thanks
Gary
"A journey of a thousand miles, begins with one step"
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

1. No, setting stable sort alone will not cause the data to be sorted. However, this option does not become available unless you choose to sort the data (and specify sort key(s)) - it is THIS that gets the data sorted (whether a stable sort or not).

2. See answer to 1.

3. You specify this property for each sort key. If you specify don't sort for all sort keys, then the output order of rows is exactly the input order of rows (and you've wasted some resources).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
gagan8877
Premium Member
Premium Member
Posts: 77
Joined: Mon Jun 19, 2006 1:30 pm

Post by gagan8877 »

ray.wurlod wrote:1. No, setting stable sort alone will not cause the data to be sorted. However, this option does not become available unless you choose to sort the data (and specify sort key(s)) - it is THIS that gets the data sorted (whether a stable sort or not).

2. See answer to 1.

3. You specify this property for each sort key. If you specify don't sort for all sort keys, then the output order of rows is exactly the input order of rows (and you've wasted some resources).
Thanks Ray. This explanation was great - its much clearer now.
Gary
"A journey of a thousand miles, begins with one step"
Post Reply