Sort Key mode

dsquest · Post by **dsquest** » Wed Mar 27, 2013 1:31 pm

Tried searching the forum but not able to find the exact answer Iam looking for.
I need to know, what is "Dont Sort (Previously Grouped)" mean?
What I understand is, if the keys are partitioned in some of the previous stages, this mode will be the right option for those.
So I can specify this option for those sub set of keys for which the partition keys match. And "Sort" the rest of the keys.
Is that true?

Thanks!

prasson_ibm · Post by **prasson_ibm** » Wed Mar 27, 2013 2:30 pm

Hi,
According to my understanding this is basically useful in memory consuption of the sort stage.If your input data is grouped on sub keys and you don't need sorted group then you apply this option,so sort stage will output the records for "end of grouping" or "end of data" .

dsquest · Post by **dsquest** » Tue Apr 09, 2013 9:38 am

Thanks Prasoon!

Does it mean, it will not sort the grouped (Partitioned) data?

I just partitioned, say 5 fields in previous stage. I need to sort 5+1 fields now. Can give Sort for the new filed and Dont Sort (Previously Grouped) for the rest of the 5 fields?

BI-RMA · Post by **BI-RMA** » Tue Apr 09, 2013 1:54 pm

Hi dsquest,

you have to careful not to mix two entirely different concepts here.

Grouping and Partitioning are by no means the same. When partitioning data by certain columns it is a frequently used option to also sort your data according to the same columns you used for partitioning. But this is just an option. You can actually use hash-partitioning or modulus-partitioningon your data and leave it unsorted.

To make Aggregations within DataStage it is a requirement that data be sorted by grouping-keys. So if you have already used five columns as group-keys in an aggregator upstream, you do not need to sort by those columns again, because your data is already in correct sort-order. This should leave you with a single row of data for any of these groups, so sorting on any other column should not have an effect on your data. Of course you might multiply the number of rows sharing these same keys by using a Transformer-Loop-Variable or by joining another stream on the same keys in the same order (to avoid resorting and repartitioning), or by a lookup returning multiple rows per input row. In any of these cases you can use the "Don't sort - already grouped"-option.

When using the options "Don't sort - already grouped" or "Don't sort - already sorted", DataStage will verify sort order on the respective columns and abort your job if it detects rows in improper sort-order. The same can be achieved by setting the environment variable "APT_NO_SORT_INSERTION_CHECK_ONLY" to "True". DataStage will not insert sort operators then in preparation for the use of operators having sorted data as a prerequisite.