Page 1 of 1

Lookup vs Join

Posted: Thu May 07, 2009 9:09 pm
by shankar_ramanath
I see comparisions of Lookup and Join in several discussions and in the manuals.

I think one of the fundamental differences is the inability of Join to provide multiple output streams (one output and one reject) whereas Lookup stage provides for this functionality.

Am I correct? I use Lookup stage currently in one of my jobs only because I need the ability to get two output streams. I could do much better with Join except for this requirement.

Many thanks in advance!

Posted: Thu May 07, 2009 9:28 pm
by datisaq
You can use the Outer join in Join stage and filter it out.

Posted: Thu May 07, 2009 9:58 pm
by ray.wurlod
There's no such thing as a reject from a join, in exactly the same way that there's no "reject" when you perform a join in SQL.

Re: Lookup vs Join

Posted: Thu May 07, 2009 9:58 pm
by datastage_learner
shankar_ramanath wrote:I see comparisions of Lookup and Join in several discussions and in the manuals.

I think one of the fundamental differences is the inability of Join to provide multiple output streams (one output and one reject) whereas Lookup stage provides for this functionality.
If I am getting you correctly, then there is no inability... but rather a behavior. Most of the Stages are specific in their behavior.
shankar_ramanath wrote:Am I correct? I use Lookup stage currently in one of my jobs only because I need the ability to get two output streams. I could do much better with Join except for this requirement.
To understand your thought process in much better way, pls. describe the activity that you are doing with either lookup or join. Also, there are different ways to address problems.

Joins and lookups can be used for varied input and reference data streams and their memory needs. Pls. describe the circumstances you are using join or lookup.

Re: Lookup vs Join

Posted: Fri May 08, 2009 12:08 am
by shankar_ramanath
Thanks to all!

[quote]
If I am getting you correctly, then there is no inability... but rather a behavior. Most of the Stages are specific in their behavior.
[/quote]

I am probably constrained in my thought process because I consider providing two output streams as an "ability". That said, I do understand that the stages have specific behavior. My question was more about any behavior that I did not decipher in Join stage. Ray's answer clarifies the question.

[quote]
To understand your thought process in much better way, pls. describe the activity that you are doing with either lookup or join. Also, there are different ways to address problems.

Joins and lookups can be used for varied input and reference data streams and their memory needs. Pls. describe the circumstances you are using join or lookup.[/quote]

Sure. I have data in the following format

Col1 Col2 Col3 Col4
ABC DEF GHI JKL
XYZ YXZ BED ABC

I need to match "ABC" in the first record with "ABC" in the second record and the difference between the two records is in Col2. Hence I create two output streams. The first output stream has a filter for Col2="DEF" and the second output stream has a filter for Col2="YXZ". The output of the first filter (Col2="DEF") is then passed as a reference to the lookup stage and the output of the second filter (Col2="YXZ") is passed as the primary link to the lookup stage. In the lookup stage, I check to see if Col1 of the first record matches Col4 of the second record. The records that DO NOT match are sent to the reject stream and I am interested in getting only those records that do not match.

Using a Lookup stage is sub-optimal because of the huge number of records that go to the reference link. The input feed has close to 40M records on average. I was thinking that Join would be better to use because of the memory usage. However, since I am in need of the data from the reject link, I could not use Join.

I will try to use Left Outer Join. May be that will come to the rescue. Thanks to datisaq. Please do suggest if there are other options.

Many Thanks

Posted: Fri May 08, 2009 2:32 am
by ray.wurlod
That is a particularly difficult requirement. Most joins know in advance which columns are to be compared. Your requirement, if I understand correctly, is to find a match arbitrarily in any column. Neither a Join stage nor a Lookup stage can help you here, unless you perform every possible join/lookup (one for each pair of columns). Yuk.