Page 1 of 1

query about join stage

Posted: Thu May 21, 2015 1:20 am
by wuruima
hi friends, I have queries about using join stage.
is it necessary to SORT both links using the keys we want to join?

in parallel job, does the partition wll impact the join result ?

Thanks.

Posted: Thu May 21, 2015 4:37 pm
by ray.wurlod
Indirectly yes. If you don't specify sorting, then compiling the job will insert sort operators on each input link to the Join stage.

Data should be partitioned on at least the first join key.