how to decide using lookup or joiner

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Rameshgoldenhill
Participant
Posts: 5
Joined: Tue Aug 26, 2008 9:00 am

how to decide using lookup or joiner

Post by Rameshgoldenhill »

How can we decide to use lookup or joiner when we are loading large amount of data.Thanks.
cdp
Premium Member
Premium Member
Posts: 113
Joined: Tue Dec 15, 2009 9:28 pm
Location: New Zealand

Post by cdp »

Hi,

There are quite a few posts out there on this subject.

But I think it comes down to the size of the two input sources that you are trying to combine and what you are wanting to do with the output.

For instance join's can only be across two inputs and there is no reject link, while lookups are really designed to read a reference source into memory and so the reference input size should be less than amount of memory available on the box. So basically if you have two large inputs then a join would probably be better!

There is also the MERGE stage, again it all depends on what your expected output looks like!
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

A joiner is someone who assembles wooden furniture, particularly cabinetry. Your choice is therefore clear.

Unless, of course, you are loading large amounts of wooden objects...

When's the interview?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

cdp wrote:For instance join's can only be across two inputs and there is no reject link, ...
This is not the case. A Join stage can have more than two inputs. In this case pairwise joins are created as intermediate results, the same way that databases do it. The "other" inputs are referred to as Intermediate. I prefer to use cascaded two-input joins to make it clearer what's happening to the next developer.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
John Smith
Charter Member
Charter Member
Posts: 193
Joined: Tue Sep 05, 2006 8:01 pm
Location: Australia

Post by John Smith »

Loading large amounts of data and performing lookups/joins are distinct operations meaning you can load large amounts of data with BOTH joins OR lookups. Doesn't matter.
DS consultant.
cdp
Premium Member
Premium Member
Posts: 113
Joined: Tue Dec 15, 2009 9:28 pm
Location: New Zealand

Post by cdp »

ray.wurlod wrote: This is not the case. A Join stage can have more than two inputs. In this case pairwise joins are created as intermediate results, the same way that databases do it. The "other" inputs are referred to as Intermediate. I prefer to use cascaded two-input joins to make it clearer what's happening to the next developer.
Do you know what, you are absolutely correct. Maybe I was confusing should with could, but I was always told not too. Sorry for the incorrect advice.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Of course you may have been confusing should with wood. :lol:
(See my earliest post on this thread.)
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

[groan] :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply