Please excuse my ignorance here. When I select a VARCHAR column to sort (Input-Partitioning-Hash-Sort), the option "Case-sensitive" is the default for the selected column. I would assume that "Case-sensitive" sort is more expensive than "Case-insensitive" sort. If so, why is it the default?
Thanks,
Case sensitive option when sorting
Moderators: chulett, rschirm, roy
-
- Premium Member
- Posts: 67
- Joined: Thu Aug 09, 2007 7:51 pm
-
- Participant
- Posts: 43
- Joined: Thu May 29, 2008 8:41 pm
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Case insensitive is more costly, because an internal conversion (to all upper case, as fas as I am aware) has to be performed and the sort performed on that field.
Last edited by ray.wurlod on Fri Apr 16, 2010 7:42 pm, edited 1 time in total.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Premium Member
- Posts: 67
- Joined: Thu Aug 09, 2007 7:51 pm
-
- Participant
- Posts: 43
- Joined: Thu May 29, 2008 8:41 pm
Thanks Ray for your valuable opinion.
Please correct me if I am wrong. Sort stage stores all data to memory space and then sorts. The most expensive part of the sort stage is storing data into memory space. Once all data is in the memory space, sorting operation does not take that much time compared to storing into memory space. That's why if you even add one more sorting key, overall processing time does go up that much. So, case sensitive option will not decrease the performance that much.
Please give your thoughtful opinion on this matter.
Best Regards.
Please correct me if I am wrong. Sort stage stores all data to memory space and then sorts. The most expensive part of the sort stage is storing data into memory space. Once all data is in the memory space, sorting operation does not take that much time compared to storing into memory space. That's why if you even add one more sorting key, overall processing time does go up that much. So, case sensitive option will not decrease the performance that much.
Please give your thoughtful opinion on this matter.
Best Regards.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
This is not the whole story. Only a certain amount of memory per node is available for sorting. Once this amount is exceeded, the sort must use disk files in the scratchdisk areas. This adds cost that you have not considered, probably more cost than the other steps (depending on the source of the rows to be sorted, which may already be in memory in a virtual Data Set).
The amount of memory available to the tsort operator is governed by the environment variable APT_TSORT_STRESS_BLOCKSIZE. If this environment variable is not set, then the Sort stage has a property that allows the amount of memory (per node) to be specified.
The amount of memory available to the tsort operator is governed by the environment variable APT_TSORT_STRESS_BLOCKSIZE. If this environment variable is not set, then the Sort stage has a property that allows the amount of memory (per node) to be specified.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.