Selection of Lookup and Join

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
ReachKumar
Participant
Posts: 29
Joined: Wed Jan 06, 2010 7:18 am

Selection of Lookup and Join

Post by ReachKumar »

Hi,

Performance wise, which one is better to go for between Join and Lookup stage in DS Parallel and why.

Can some one explain in which scenarios we go for join and for lookup?
Regards,
Kumar
surajkumar
Participant
Posts: 17
Joined: Wed Feb 06, 2008 5:09 am

Re: Selection of Lookup and Join

Post by surajkumar »

In all cases we are concerned with the size of the reference datasets. If
these take up a large amount of memory relative to the physical RAM
memory size of the computer you are running on, then a lookup stage
may thrash because the reference datasets may not fit in RAM along with
everything else that has to be in RAM. This results in very slow
performance since each lookup operation can, and typically does, cause a
page fault and an I/O operation.
So, if the reference datasets are big enough to cause trouble, use a join. A
join does a high-speed sort on the driving and reference datasets. This can
involve I/O if the data is big enough, but the I/O is all highly optimized
and sequential. Once the sort is over the join processing is very fast and
never involves paging or other I/O.
SURAJKUMAR M
ReachKumar
Participant
Posts: 29
Joined: Wed Jan 06, 2010 7:18 am

Post by ReachKumar »

Informative.. Thanks Suraj..
Regards,
Kumar
Post Reply