Sort: DataStage vs. Unix

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
bobyon
Premium Member
Premium Member
Posts: 200
Joined: Tue Mar 02, 2004 10:25 am
Location: Salisbury, NC

Sort: DataStage vs. Unix

Post by bobyon »

I've read through many posts here regarding sorts but, as often happens, I'm left with as many new questions as answers.

What are the primary considerations when deciding between using the DataStage vs. Unix Sort? Is DataStage preferred for sorting datasets, and Unix for Sequential files?

And, other than additional hardware and license, what are some general recommendations for improving Sort throughput performance?

I've avoided the use of stable sort except where data is previously sorted,as I understand that will use additional resources;

I've coded all sorts as explicit sort stages (rather than in-line) for two reasons: 1-clarity when maintaining the job; and 2-so we can increase the restrict memory usage option. However, I have not found any particular method for determining what that value should be set to. Is there some formula that should be used to determine a good starting point for tuning this option?

Thanks
Bob
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Hmmm... not sure if you stayed in "job type" when you looked but I would think most discussions of sort here, DS v. UNIX, would be for a Server job. The sort there is more of a "poor mans" sort and not really suited for large volumes, necessitating a need to fall back to the O/S at times. Not really sure that is all that needed for the PX product.

And to me, the only circumstance that might bring me out of PX to sort would be when the client has access to some kind of high-speed sort package, like SyncSort or Co-sort. My two cents, anyway.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The best way to tune the Sort stage memory property is to measure its memory consumption using Performance Analysis (maybe even Resource Estimator) tool. Increase memory until the tsort operator stops using disk.

Beware that if APT_TSORT_STRESS_BLOCKSIZE is set it overrides any memory property setting in Sort stages.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply