is sorting before joining mandatory?
Moderators: chulett, rschirm, roy
is sorting before joining mandatory?
Is it enough if I partition the data on the joining keys before join? Or should I also sort the data on the joining keys? Also if I do not sort, will the join output be wrong?
Re: is sorting before joining mandatory?
Hi,
The result will not affect. But you know it will not be optimal and it will cause thrashing. As you know sorting takes lots of memory and time. So it would be better to sort and partitin the dataset before joining.
Note:
-----It also minimizes memory requirements because fewer rows need to be in memory at any one time.---------
The result will not affect. But you know it will not be optimal and it will cause thrashing. As you know sorting takes lots of memory and time. So it would be better to sort and partitin the dataset before joining.
Note:
-----It also minimizes memory requirements because fewer rows need to be in memory at any one time.---------
tejaswini wrote:Is it enough if I partition the data on the joining keys before join? Or should I also sort the data on the joining keys? Also if I do not sort, will the join output be wrong?
sanjeev kumar
-
- Participant
- Posts: 34
- Joined: Fri Sep 22, 2006 10:59 pm
- Location: India
-
- Participant
- Posts: 222
- Joined: Tue Aug 30, 2005 2:07 am
- Location: pune
- Contact:
Hi,
To get the accurate results, It is always better to sort the data and at the same time perform HASH partition also. One more thing here you can include is using of environmental variable APT_SORT_INSERTION_CHECK_ONLY. For stages like join, datastage will insert Tsort operaor. It will happen, even though you have sorted the data before sending to the join stage. The above mentioned variable will just check the sort order, if it is sorted, it will not include the Tsort operator.It will increase your performance in a countable manner.
To get the accurate results, It is always better to sort the data and at the same time perform HASH partition also. One more thing here you can include is using of environmental variable APT_SORT_INSERTION_CHECK_ONLY. For stages like join, datastage will insert Tsort operaor. It will happen, even though you have sorted the data before sending to the join stage. The above mentioned variable will just check the sort order, if it is sorted, it will not include the Tsort operator.It will increase your performance in a countable manner.
NageshSunkoji
If you know anything SHARE it.............
If you Don't know anything LEARN it...............
If you know anything SHARE it.............
If you Don't know anything LEARN it...............
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
The Join stage requires its inputs to be sorted, so that it can employ an efficient memory management algorithm. If you don't specify sorted data the composed score will have tsort operators inserted on the input links so that the data will, in fact, be sorted. It is far better technique to retain control of sorting, so that unnecessary sorting does not occur.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.