Page 1 of 1

DataStage Sort Stage vs Inline Link Sort

Posted: Wed Dec 04, 2013 8:44 am
by rwierdsm
Folks,

The sort Stage in 8.X works much better than it used to in 7.X. We did some benchmarks which indicated that for our test file / environment the sort stage was about twice as fast, presumably because the sort stage can allocate more memory to the sort.

Is everyone seeing this kind of performance difference? Are there drawbacks to using the sort stage over the inline sort?

Rob

Posted: Wed Dec 04, 2013 9:10 am
by chulett
They're the same tsort operator under the covers. The stage just makes it more 'visible' in the job and gives you control over the parameters the sort can use.

Posted: Wed Dec 04, 2013 9:18 am
by rwierdsm
Thanks for your response Craig.

Does the sort stage default to a higher amount of allocated memory? We saw significantly better performance in the stage. Is there some risk in using too much memory when lots of sort stages are invoked at the same time?

Rob

Posted: Wed Dec 04, 2013 9:36 am
by chulett
Not sure about the default... others will have to answer that. As to the risk, sure, there's always that kind of resource issue risk when doing lots of anything. :wink:

Posted: Thu Dec 05, 2013 3:59 am
by ray.wurlod
The stage defaults to the same amount of memory as the inlink sort. The difference is that you can change it in the stage.

The global memory for tsort operators is set by environment variable APT_TSORT_STRESS_BLOCKSIZE

Yes, you can demand more memory than the system can provide. You can even do this at the default setting. The symptom is a lot of temporary files with "sort" as part of their name in the scratchdisk.

Posted: Thu Dec 05, 2013 12:49 pm
by rwierdsm
From the IBM doco for 8.5

===========
Restrict memory usage
This is set to 20 by default. It causes the Sort stage to restrict itself to the specified number of megabytes of virtual memory on a processing node.

The number of megabytes specified should be smaller than the amount of physical memory on a processing node. For Windows systems, the value for Restrict Memory Usage should not exceed 500.
==============

This number can be modified on the Properties Tab. I was not able to find a indication of how much memory is used by the inline sort, however, based on our benchmarks, it would be considerably less.

Rob