query about Join sort

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
wuruima
Participant
Posts: 65
Joined: Mon Nov 04, 2013 10:15 pm

query about Join sort

Post by wuruima »

Hi,

As I know, when we use join stage to join 2 input links, even though we don't set the 'perform sort' in the input link, DS will help to sort. (Is this correct?)

my question is, can we decide to use which function to sort? e.g. use the memory or hard disk to sort.
The default option of Join stage is ?
wuruimao
Thomas.B
Participant
Posts: 63
Joined: Thu Apr 09, 2015 6:40 am
Location: France - Nantes

Post by Thomas.B »

To see if DataStage sort your data before processing the join operation, just set the APT_DUMP_SCORE to 1 and run it.
In the log you will see the 'Score', if there is a 'tsort operator' for your join, then DataStage sort your data.

By default, when DataStage sort data he use 20MB per partition as an internal memory buffer. When more space is required, he use the 'scratch' disk defined in the APT_CONFIG_FILE.

I don't think you can choose which sort process to use.
BI Consultant
DSXConsult
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You can only choose which sort process to use by placing an explicit Sort stage on the input links to the Join stage.

You are correct that DataStage will insert a tsort operator by default if you have not otherwise specified sorting on the input links.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply