Case sensitive option when sorting

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
shankar_ramanath
Premium Member
Premium Member
Posts: 67
Joined: Thu Aug 09, 2007 7:51 pm

Case sensitive option when sorting

Post by shankar_ramanath »

Please excuse my ignorance here. When I select a VARCHAR column to sort (Input-Partitioning-Hash-Sort), the option "Case-sensitive" is the default for the selected column. I would assume that "Case-sensitive" sort is more expensive than "Case-insensitive" sort. If so, why is it the default?

Thanks,
chowdhury99
Participant
Posts: 43
Joined: Thu May 29, 2008 8:41 pm

Post by chowdhury99 »

I don't think so. Unless you need case-insensitive, you may keep that option.

Thanks
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

It's the default as that is the most common choice for that option.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Case insensitive is more costly, because an internal conversion (to all upper case, as fas as I am aware) has to be performed and the sort performed on that field.
Last edited by ray.wurlod on Fri Apr 16, 2010 7:42 pm, edited 1 time in total.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
shankar_ramanath
Premium Member
Premium Member
Posts: 67
Joined: Thu Aug 09, 2007 7:51 pm

Post by shankar_ramanath »

Thanks Ray. I agree. I did not think about it.
chowdhury99
Participant
Posts: 43
Joined: Thu May 29, 2008 8:41 pm

Post by chowdhury99 »

Thanks Ray for your valuable opinion.

Please correct me if I am wrong. Sort stage stores all data to memory space and then sorts. The most expensive part of the sort stage is storing data into memory space. Once all data is in the memory space, sorting operation does not take that much time compared to storing into memory space. That's why if you even add one more sorting key, overall processing time does go up that much. So, case sensitive option will not decrease the performance that much.

Please give your thoughtful opinion on this matter.

Best Regards.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

This is not the whole story. Only a certain amount of memory per node is available for sorting. Once this amount is exceeded, the sort must use disk files in the scratchdisk areas. This adds cost that you have not considered, probably more cost than the other steps (depending on the source of the rows to be sorted, which may already be in memory in a virtual Data Set).

The amount of memory available to the tsort operator is governed by the environment variable APT_TSORT_STRESS_BLOCKSIZE. If this environment variable is not set, then the Sort stage has a property that allows the amount of memory (per node) to be specified.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply