Page 1 of 1

how to decide using lookup or joiner

Posted: Sun Mar 14, 2010 12:29 pm
by Rameshgoldenhill
How can we decide to use lookup or joiner when we are loading large amount of data.Thanks.

Posted: Sun Mar 14, 2010 4:14 pm
by cdp
Hi,

There are quite a few posts out there on this subject.

But I think it comes down to the size of the two input sources that you are trying to combine and what you are wanting to do with the output.

For instance join's can only be across two inputs and there is no reject link, while lookups are really designed to read a reference source into memory and so the reference input size should be less than amount of memory available on the box. So basically if you have two large inputs then a join would probably be better!

There is also the MERGE stage, again it all depends on what your expected output looks like!

Posted: Sun Mar 14, 2010 4:16 pm
by ray.wurlod
A joiner is someone who assembles wooden furniture, particularly cabinetry. Your choice is therefore clear.

Unless, of course, you are loading large amounts of wooden objects...

When's the interview?

Posted: Sun Mar 14, 2010 4:19 pm
by ray.wurlod
cdp wrote:For instance join's can only be across two inputs and there is no reject link, ...
This is not the case. A Join stage can have more than two inputs. In this case pairwise joins are created as intermediate results, the same way that databases do it. The "other" inputs are referred to as Intermediate. I prefer to use cascaded two-input joins to make it clearer what's happening to the next developer.

Posted: Sun Mar 14, 2010 4:52 pm
by John Smith
Loading large amounts of data and performing lookups/joins are distinct operations meaning you can load large amounts of data with BOTH joins OR lookups. Doesn't matter.

Posted: Sun Mar 14, 2010 6:29 pm
by cdp
ray.wurlod wrote: This is not the case. A Join stage can have more than two inputs. In this case pairwise joins are created as intermediate results, the same way that databases do it. The "other" inputs are referred to as Intermediate. I prefer to use cascaded two-input joins to make it clearer what's happening to the next developer.
Do you know what, you are absolutely correct. Maybe I was confusing should with could, but I was always told not too. Sorry for the incorrect advice.

Posted: Sun Mar 14, 2010 6:53 pm
by ray.wurlod
Of course you may have been confusing should with wood. :lol:
(See my earliest post on this thread.)

Posted: Sun Mar 14, 2010 9:15 pm
by chulett
[groan] :wink: