Page 1 of 1

Join stage

Posted: Thu Aug 18, 2011 6:27 am
by dsscholar
Hi Guys,

In join stage, i am aware that the parallel engine will insert a tsort operator to do the sort operation, which in turn requires temp space in scratch disk to store the temporary files. If i dont do the sort operation by giving "dont sort environmental variable", temp space wont be required. How does the join based on the key happens here. It uses the temp space to do the join operation or it do in "on-fly" during database access. If yes, does join requires less temp space than sort. Please explain this scenario.

Thanks in advance.

Posted: Thu Aug 18, 2011 3:27 pm
by ray.wurlod
The Join stage will not produce correct results, or may run out of memory, if the data are not sorted. If you prevent insertion of a tsort operator (without providing your own sorting on the input links), your job is likely to abort.

Posted: Fri Aug 19, 2011 7:43 am
by dsscholar
Thanks Ray!

Does the join stage use the sratch disk space for doing the join operation or it uses the database temp space or it dont use temp space and just do the join and display the results "on-fly" with the database

Thanks in advance.

Posted: Fri Aug 19, 2011 7:54 am
by jwiles
No, no and no.

The join stage is performing it's work in memory, which is why it can run out of memory as Ray mentioned. It doesn't use scratch space, temp space or database space.

Buffers on the input links may use scratch or temp space on your servers.

Regards,

Posted: Mon Aug 22, 2011 1:23 pm
by vijaykumarpj
To avoid inserting the Tsort operator, you can add explicit sorter stage, before the Join stage.