need help in understanding the functionality of join stage

zulfi123786 · Post by **zulfi123786** » Mon Jan 19, 2009 7:43 am

Hi,

how does a join stage run?
for ex: a look up places all reference records in the memory and then matches the input link rows with those in the memory but what about a join stage???
will it place only few records in memory if so on what basis? or does it read all incoming data first before outputting the first record(assume that data is previously sorted and no sort specified on the i/p link).

samsuf2002 · Post by **samsuf2002** » Mon Jan 19, 2009 8:48 am

There is a good explanation for your question in the documentation provided by Data Stage.

ray.wurlod · Post by **ray.wurlod** » Mon Jan 19, 2009 4:05 pm

When there are two inputs, Left and Right, the Join stage works as follows. Recall that the input are sorted on the join key.

Get all the rows with the next key value from the Left input. (Usually this is not many rows.)
Get all the rows with that key value from the Right input. (This can be as few as zero.)
Generate all appropriate combinations (based on join type) and write these to the output.
Repeat until no more data.

If there are more than two inputs, the same approach is taken, with input links being processed pairwise and intermediate results being stored in memory or, more likely, on scratchdisk.