Sort Key mode

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
dsquest
Participant
Posts: 6
Joined: Wed Mar 27, 2013 1:24 pm

Sort Key mode

Post by dsquest »

Tried searching the forum but not able to find the exact answer Iam looking for.
I need to know, what is "Dont Sort (Previously Grouped)" mean?
What I understand is, if the keys are partitioned in some of the previous stages, this mode will be the right option for those.
So I can specify this option for those sub set of keys for which the partition keys match. And "Sort" the rest of the keys.
Is that true?

Thanks!
prasson_ibm
Premium Member
Premium Member
Posts: 536
Joined: Thu Oct 11, 2007 1:48 am
Location: Bangalore

Post by prasson_ibm »

Hi,
According to my understanding this is basically useful in memory consuption of the sort stage.If your input data is grouped on sub keys and you don't need sorted group then you apply this option,so sort stage will output the records for "end of grouping" or "end of data" .
dsquest
Participant
Posts: 6
Joined: Wed Mar 27, 2013 1:24 pm

Post by dsquest »

Thanks Prasoon!

Does it mean, it will not sort the grouped (Partitioned) data?

I just partitioned, say 5 fields in previous stage. I need to sort 5+1 fields now. Can give Sort for the new filed and Dont Sort (Previously Grouped) for the rest of the 5 fields?
BI-RMA
Premium Member
Premium Member
Posts: 463
Joined: Sun Nov 01, 2009 3:55 pm
Location: Hamburg

Post by BI-RMA »

Hi dsquest,

you have to careful not to mix two entirely different concepts here.

Grouping and Partitioning are by no means the same. When partitioning data by certain columns it is a frequently used option to also sort your data according to the same columns you used for partitioning. But this is just an option. You can actually use hash-partitioning or modulus-partitioningon your data and leave it unsorted.

To make Aggregations within DataStage it is a requirement that data be sorted by grouping-keys. So if you have already used five columns as group-keys in an aggregator upstream, you do not need to sort by those columns again, because your data is already in correct sort-order. This should leave you with a single row of data for any of these groups, so sorting on any other column should not have an effect on your data. Of course you might multiply the number of rows sharing these same keys by using a Transformer-Loop-Variable or by joining another stream on the same keys in the same order (to avoid resorting and repartitioning), or by a lookup returning multiple rows per input row. In any of these cases you can use the "Don't sort - already grouped"-option.

When using the options "Don't sort - already grouped" or "Don't sort - already sorted", DataStage will verify sort order on the respective columns and abort your job if it detects rows in improper sort-order. The same can be achieved by setting the environment variable "APT_NO_SORT_INSERTION_CHECK_ONLY" to "True". DataStage will not insert sort operators then in preparation for the use of operators having sorted data as a prerequisite.
"It is not the lucky ones are grateful.
There are the grateful those are happy." Francis Bacon
Post Reply