Question regarding Sort and Partitioning

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Raftsman
Premium Member
Premium Member
Posts: 335
Joined: Thu May 26, 2005 8:56 am
Location: Ottawa, Canada

Question regarding Sort and Partitioning

Post by Raftsman »

When would we implement a sort routine prior to the aggregator? I know that if the aggregator input tab partitioning is set to Auto, Datastage takes care of the sorting and grouping. What concerns me is the documentation. It states that we should sort and repartition the data prior to the aggregator. Is it really required because the Director log states a sort and grouping is done on the keys that are selected. I have test both mechanism and they both send back the exact same data.

Is repartitioning only used with the duplicate and join stages.

Could some please elaborate.

Thanks
Jim Stewart
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Using an explicit Sort stage allows YOU to control the resources allocated to sorting and, perhaps, the ability to assert that some or all of the key columns are already sorted/grouped. Relying upon DataStage to insert tsort operators means that you get default allocations, which may not be optimal.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply