Sort stage or Link Sort

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
harikhk
Participant
Posts: 64
Joined: Tue Jun 04, 2013 11:36 am

Sort stage or Link Sort

Post by harikhk »

Hi,

I am writing data from a sequential file to a dataset.
The volume of the data ranges from 8 millions to 20 millions for different files.

I need this data to be sorted based on a single key.

I am not sure which sorting is better for sorting for better performance with this volume of data

Please help me in knowing which is better

My version is 8.5
Thanks,
HK
*Go GREEN..Save Earth*
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Use an explicit Sort stage. Partition data by the sort key.

The Sort stage allows you to allocate more memory than the default to the sorting operation, which means it takes longer before the sort has to spill to scratchdisk.

You can control the default with an environment variable called APT_TSORT_STRESS_BLOCKSIZE but beware that this is a global change across the scope of the variable (project or job).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply