Is sort merge collector optimised for node sorted input ?

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
zulfi123786
Premium Member
Premium Member
Posts: 730
Joined: Tue Nov 04, 2008 10:14 am
Location: Bangalore

Is sort merge collector optimised for node sorted input ?

Post by zulfi123786 »

Hi,

I was wondering if sort merge collector is optimised for parallel sorted input. One of the developers wanted a total sorted sequential file and to have so, used a sort stage before the sequential file and left the collector in auto mode.

Before flipping the collector to sort merge wanted to know if it would blindly resort the data again or is it intelligent enough to identify that incoming is previously grouped and node sorted. The file size being 100 GB forces me to think on these lines

Interesting fact is that current run file though being 100GB was totally sorted, was expecting Atleast few breaks, mysteries of the auto mode :)

Thanks
- Zulfi
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Sort-merge collector does not re-sort the data (blindly or otherwise). It depends on the fact that the data are sorted already, partition by partition, on the indicated key, and monitors the next value queued to come in from each partition, transferring the one that is next in sorted order.

Auto does not select sort-merge as the collection algorithm. It may be that your sorted parallel data were partitioned using a method amenable to the "hungry" round robin collection that Auto selects.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply