Question regarding Sort and Partitioning

Raftsman · Post by **Raftsman** » Mon May 11, 2009 12:59 pm

When would we implement a sort routine prior to the aggregator? I know that if the aggregator input tab partitioning is set to Auto, Datastage takes care of the sorting and grouping. What concerns me is the documentation. It states that we should sort and repartition the data prior to the aggregator. Is it really required because the Director log states a sort and grouping is done on the keys that are selected. I have test both mechanism and they both send back the exact same data.

Is repartitioning only used with the duplicate and join stages.

Could some please elaborate.

Thanks

ray.wurlod · Post by **ray.wurlod** » Mon May 11, 2009 4:12 pm

Using an explicit Sort stage allows YOU to control the resources allocated to sorting and, perhaps, the ability to assert that some or all of the key columns are already sorted/grouped. Relying upon DataStage to insert tsort operators means that you get default allocations, which may not be optimal.