Tried searching the forum but not able to find the exact answer Iam looking for.
I need to know, what is "Dont Sort (Previously Grouped)" mean?
What I understand is, if the keys are partitioned in some of the previous stages, this mode will be the right option for those.
So I can specify this option for those sub set of keys for which the partition keys match. And "Sort" the rest of the keys.
Is that true?
Thanks!
Sort Key mode
Moderators: chulett, rschirm, roy
-
- Premium Member
- Posts: 536
- Joined: Thu Oct 11, 2007 1:48 am
- Location: Bangalore
Hi,
According to my understanding this is basically useful in memory consuption of the sort stage.If your input data is grouped on sub keys and you don't need sorted group then you apply this option,so sort stage will output the records for "end of grouping" or "end of data" .
According to my understanding this is basically useful in memory consuption of the sort stage.If your input data is grouped on sub keys and you don't need sorted group then you apply this option,so sort stage will output the records for "end of grouping" or "end of data" .
Thanks
Prasoon
ETL Consultant
LinkedIn :- http://www.linkedin.com/profile/view?id ... ab_pro_top
Blog:- http://dsshar.blogspot.com/
Prasoon
ETL Consultant
LinkedIn :- http://www.linkedin.com/profile/view?id ... ab_pro_top
Blog:- http://dsshar.blogspot.com/
Hi dsquest,
you have to careful not to mix two entirely different concepts here.
Grouping and Partitioning are by no means the same. When partitioning data by certain columns it is a frequently used option to also sort your data according to the same columns you used for partitioning. But this is just an option. You can actually use hash-partitioning or modulus-partitioningon your data and leave it unsorted.
To make Aggregations within DataStage it is a requirement that data be sorted by grouping-keys. So if you have already used five columns as group-keys in an aggregator upstream, you do not need to sort by those columns again, because your data is already in correct sort-order. This should leave you with a single row of data for any of these groups, so sorting on any other column should not have an effect on your data. Of course you might multiply the number of rows sharing these same keys by using a Transformer-Loop-Variable or by joining another stream on the same keys in the same order (to avoid resorting and repartitioning), or by a lookup returning multiple rows per input row. In any of these cases you can use the "Don't sort - already grouped"-option.
When using the options "Don't sort - already grouped" or "Don't sort - already sorted", DataStage will verify sort order on the respective columns and abort your job if it detects rows in improper sort-order. The same can be achieved by setting the environment variable "APT_NO_SORT_INSERTION_CHECK_ONLY" to "True". DataStage will not insert sort operators then in preparation for the use of operators having sorted data as a prerequisite.
you have to careful not to mix two entirely different concepts here.
Grouping and Partitioning are by no means the same. When partitioning data by certain columns it is a frequently used option to also sort your data according to the same columns you used for partitioning. But this is just an option. You can actually use hash-partitioning or modulus-partitioningon your data and leave it unsorted.
To make Aggregations within DataStage it is a requirement that data be sorted by grouping-keys. So if you have already used five columns as group-keys in an aggregator upstream, you do not need to sort by those columns again, because your data is already in correct sort-order. This should leave you with a single row of data for any of these groups, so sorting on any other column should not have an effect on your data. Of course you might multiply the number of rows sharing these same keys by using a Transformer-Loop-Variable or by joining another stream on the same keys in the same order (to avoid resorting and repartitioning), or by a lookup returning multiple rows per input row. In any of these cases you can use the "Don't sort - already grouped"-option.
When using the options "Don't sort - already grouped" or "Don't sort - already sorted", DataStage will verify sort order on the respective columns and abort your job if it detects rows in improper sort-order. The same can be achieved by setting the environment variable "APT_NO_SORT_INSERTION_CHECK_ONLY" to "True". DataStage will not insert sort operators then in preparation for the use of operators having sorted data as a prerequisite.
"It is not the lucky ones are grateful.
There are the grateful those are happy." Francis Bacon
There are the grateful those are happy." Francis Bacon