Order By Clause or Explicit Sort Stage
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 284
- Joined: Fri Oct 13, 2006 4:31 am
Order By Clause or Explicit Sort Stage
I am joining two DB2 tables. Is it necessary to put an explicit Sort Stage before join or can I use the ORDER BY clause in the DB2 query itself and then directly link the DB2 stages with the Join stage without using Sort?
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
If you don't put in a Sort stage, DataStage will insert tsort operators, because the Join stage requires sorted inputs. If you have an ORDER BY clause, you can make the Sort stage specify "don't sort (already sorted)" so that it does nothing, but the existence of the Sort stage will prevent DataStage from inserting unnecessary tsort operators.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 284
- Joined: Fri Oct 13, 2006 4:31 am
Join Stage basically needs the input to be sorted. This is achieved by ORDER BY clause in DB2 stage itself.
Even after specifying ORDER BY in DB2, is there a necessity to include an explicit sort stage, and then again specify "don't sort " in that stage. Is there a point in doing this because it is equivalent to not including Sort Stage.
One can reduce the overhead of including additional stages when the same functionality is provided by the previous stage (ORDER BY clause in this case).
Even after specifying ORDER BY in DB2, is there a necessity to include an explicit sort stage, and then again specify "don't sort " in that stage. Is there a point in doing this because it is equivalent to not including Sort Stage.
One can reduce the overhead of including additional stages when the same functionality is provided by the previous stage (ORDER BY clause in this case).
-
- Participant
- Posts: 222
- Joined: Tue Aug 30, 2005 2:07 am
- Location: pune
- Contact:
One can reduce the overhead of including additional stages when the same functionality is provided by the previous stage (ORDER BY clause in this case).[/quote]
Here overhead of one sort stage is not a problem. The tsort operators from Datastage will overhead to your job. The tsort operators will degrade your performance. To avoid tsort opertors use sort satge with the option Don't sort if previously sorted and add environment variable APT_SORT_INSERTION_CHECK_ONLY. It will stop the tsort opertors, if data is already sorted.
Addition of one stage and perofrmance, you have to decide.
Here overhead of one sort stage is not a problem. The tsort operators from Datastage will overhead to your job. The tsort operators will degrade your performance. To avoid tsort opertors use sort satge with the option Don't sort if previously sorted and add environment variable APT_SORT_INSERTION_CHECK_ONLY. It will stop the tsort opertors, if data is already sorted.
Addition of one stage and perofrmance, you have to decide.
NageshSunkoji
If you know anything SHARE it.............
If you Don't know anything LEARN it...............
If you know anything SHARE it.............
If you Don't know anything LEARN it...............
-
- Participant
- Posts: 284
- Joined: Fri Oct 13, 2006 4:31 am
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
It may not make sense to you, but it's the technique we use to prevent insertion by the framework of unnecessary tsort operators. If you don't like it, then don't complain about the performance hit taken because of the inserted tsort operators resorting already-sorted data.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 222
- Joined: Tue Aug 30, 2005 2:07 am
- Location: pune
- Contact:
Hi Abhilash,abhilashnair wrote:Nagesh, do you mean to say that, if I include explicit sort stage, then DataStage won't use tsort operators?
It does not make sense to me to use sort stage and then again tell the stage "Dont' sort".
You can do one thing to avoid one extra sort stage, perform HASH partition on Join stage based on keys as you have already performed on Database and use environmental variable APT_SORT_INSERTION_CHECK_ONLY and try out for result. If your results are proper at the output of Join stage then that's ok for you.
NageshSunkoji
If you know anything SHARE it.............
If you Don't know anything LEARN it...............
If you know anything SHARE it.............
If you Don't know anything LEARN it...............
-
- Participant
- Posts: 284
- Joined: Fri Oct 13, 2006 4:31 am
I am not clear on this. I have few questions:
1) If I use an ORDER BY clause in DB2 stage, the output data is sorted. This sorted data is passed to join stage. Do you mean to say that in the join stage, tsort operators will again try to sort the already sorted data of the DB2 stage?
2) Suppose I use explicit sort stage in between. In that case, wont the framework again insert tsort in join stage?
3) What exactly is the use of sort stage? What is that APT_SORT_INSERTION_CHECK_ONLY operator used for?
4) If I use explicit sort, is there a need to include ORDER BY in DB2
1) If I use an ORDER BY clause in DB2 stage, the output data is sorted. This sorted data is passed to join stage. Do you mean to say that in the join stage, tsort operators will again try to sort the already sorted data of the DB2 stage?
2) Suppose I use explicit sort stage in between. In that case, wont the framework again insert tsort in join stage?
3) What exactly is the use of sort stage? What is that APT_SORT_INSERTION_CHECK_ONLY operator used for?
4) If I use explicit sort, is there a need to include ORDER BY in DB2
Yes. DataStage tries to optimise the code and puts in a tsort operator before the Join, eventhough the data has alredy been sorted in the DB/DB2 stage.1) If I use an ORDER BY clause in DB2 stage, the output data is sorted. This sorted data is passed to join stage. Do you mean to say that in the join stage, tsort operators will again try to sort the already sorted data of the DB2 stage?
If you put in a Sort stage before the join and mention the property as "Don't Sort", DS will not put in the tsort operator. Else the tsort operator is again put in.2) Suppose I use explicit sort stage in between. In that case, wont the framework again insert tsort in join stage?
3) What exactly is the use of sort stage? What is that APT_SORT_INSERTION_CHECK_ONLY operator used for?
This Environment Variable checks if the data is sorted or not and only if the data is not sorted inserts the tsort operator into the code for optimisation.
If you are using an Order By in the DB2 stage - then put the sort Stage and set the property inside as "Don't Sort".4) If I use explicit sort, is there a need to include ORDER BY in DB2.
If you are not doing an Order By in the DB2 stage, then put in the Sort stage, set the option to "Sort" and sort the data before feeding it to the Join stage.
Whatever be the sort option - the partitioning should always be Hash for the data going into the Join stage.
Aneesh
-
- Participant
- Posts: 284
- Joined: Fri Oct 13, 2006 4:31 am
Sort stage will itself insert a sort operator, and by giving Don't sort, you are suppresing it. Rather you can include the environmental variable APT_NO_SORT_INSERTION in that job if you have used order by in both the source stages.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
-
- Participant
- Posts: 284
- Joined: Fri Oct 13, 2006 4:31 am
Group by dosn't sort the data and Auto partition doesn't assure you the sorted output. The preferred option is to hold hash partition on the sorting key which should be same or should atleast cover up all the joining key.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
"Don't sort (previously grouped)"abhilashnair wrote:What if the output from DB2 is grouped but not sorted? What should be the partition method to bbe used in the Sort stage? Currently it is Auto.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.