Sort Merge

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
gbusson
Participant
Posts: 98
Joined: Fri Oct 07, 2005 2:50 am
Location: France
Contact:

Sort Merge

Post by gbusson »

Hi all,

I 've a question about the sort merge library in datastage EE (PX).

I thought first that it worked fine only when partitions are sorted on the key(s).

I've been told on dsxchange that the "pre-sort" was unnecessary.

I made a test, the result tells me that the sort order is totally different beetween the 2 methods (pre sort + sort merge vs sort merge only)

Previous tests told me that pre sort + sort merge make good results.

So, who's right?

Thnaks for your help.
michaeld
Premium Member
Premium Member
Posts: 88
Joined: Tue Apr 04, 2006 8:42 am
Location: Toronto, Canada

Post by michaeld »

If you do not insert a sort operator then datastage will insert one for you behind the scenes. It will sort on the keys that are being merged.

It is best to insert the sort operator yourself because that way you know exactly what it is sorting on and you have control over other aspects of the sort. Like memory usage, not sorting columns that are already sorted, etc...

The mixed result may also be due to datastage repartitioning your data. Set the partitioning to SAME to make sure that datastage does not repartition and check the score in the log for more details on what it is doing behind the scenes.
Mike
gbusson
Participant
Posts: 98
Joined: Fri Oct 07, 2005 2:50 am
Location: France
Contact:

Post by gbusson »

Hi, I fotgot to tell one thing :

I set nosortinsertion nopartinsertion!
Post Reply