Hi,
how does a join stage run?
for ex: a look up places all reference records in the memory and then matches the input link rows with those in the memory but what about a join stage???
will it place only few records in memory if so on what basis? or does it read all incoming data first before outputting the first record(assume that data is previously sorted and no sort specified on the i/p link).
need help in understanding the functionality of join stage
Moderators: chulett, rschirm, roy
-
- Premium Member
- Posts: 730
- Joined: Tue Nov 04, 2008 10:14 am
- Location: Bangalore
-
- Premium Member
- Posts: 397
- Joined: Wed Apr 12, 2006 2:28 pm
- Location: Tennesse
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
When there are two inputs, Left and Right, the Join stage works as follows. Recall that the input are sorted on the join key.
Get all the rows with the next key value from the Left input. (Usually this is not many rows.)
Get all the rows with that key value from the Right input. (This can be as few as zero.)
Generate all appropriate combinations (based on join type) and write these to the output.
Repeat until no more data.
If there are more than two inputs, the same approach is taken, with input links being processed pairwise and intermediate results being stored in memory or, more likely, on scratchdisk.
Get all the rows with the next key value from the Left input. (Usually this is not many rows.)
Get all the rows with that key value from the Right input. (This can be as few as zero.)
Generate all appropriate combinations (based on join type) and write these to the output.
Repeat until no more data.
If there are more than two inputs, the same approach is taken, with input links being processed pairwise and intermediate results being stored in memory or, more likely, on scratchdisk.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.