Selection of Lookup and Join

ReachKumar · Post by **ReachKumar** » Wed Mar 17, 2010 5:07 am

Hi,

Performance wise, which one is better to go for between Join and Lookup stage in DS Parallel and why.

Can some one explain in which scenarios we go for join and for lookup?

surajkumar · Post by **surajkumar** » Wed Mar 17, 2010 5:23 am

In all cases we are concerned with the size of the reference datasets. If
these take up a large amount of memory relative to the physical RAM
memory size of the computer you are running on, then a lookup stage
may thrash because the reference datasets may not fit in RAM along with
everything else that has to be in RAM. This results in very slow
performance since each lookup operation can, and typically does, cause a
page fault and an I/O operation.
So, if the reference datasets are big enough to cause trouble, use a join. A
join does a high-speed sort on the driving and reference datasets. This can
involve I/O if the data is big enough, but the I/O is all highly optimized
and sequential. Once the sort is over the join processing is very fast and
never involves paging or other I/O.

ReachKumar · Post by **ReachKumar** » Wed Mar 17, 2010 6:04 am

Informative.. Thanks Suraj..

DSXchange

Selection of Lookup and Join

Selection of Lookup and Join

Re: Selection of Lookup and Join