need help in understanding the functionality of join stage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
zulfi123786
Premium Member
Premium Member
Posts: 730
Joined: Tue Nov 04, 2008 10:14 am
Location: Bangalore

need help in understanding the functionality of join stage

Post by zulfi123786 »

Hi,

how does a join stage run?
for ex: a look up places all reference records in the memory and then matches the input link rows with those in the memory but what about a join stage???
will it place only few records in memory if so on what basis? or does it read all incoming data first before outputting the first record(assume that data is previously sorted and no sort specified on the i/p link).
samsuf2002
Premium Member
Premium Member
Posts: 397
Joined: Wed Apr 12, 2006 2:28 pm
Location: Tennesse

Post by samsuf2002 »

There is a good explanation for your question in the documentation provided by Data Stage.
hi sam here
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

When there are two inputs, Left and Right, the Join stage works as follows. Recall that the input are sorted on the join key.

Get all the rows with the next key value from the Left input. (Usually this is not many rows.)
Get all the rows with that key value from the Right input. (This can be as few as zero.)
Generate all appropriate combinations (based on join type) and write these to the output.
Repeat until no more data.

If there are more than two inputs, the same approach is taken, with input links being processed pairwise and intermediate results being stored in memory or, more likely, on scratchdisk.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply