difference between normal and sparse lookup

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
ramesh_c
Participant
Posts: 27
Joined: Thu Dec 14, 2006 3:37 am
Location: delhi

difference between normal and sparse lookup

Post by ramesh_c »

Hi All,

I am new to datastage. I want to know the diffrence between the normal lookup and sparse lookup. In which scenarios we will use the normal lookup and sparse lookup.How internally the datastage execution will be done for these two types of lookups.

Thanks in advance,
Ramesh
us1aslam1us
Charter Member
Charter Member
Posts: 822
Joined: Sat Sep 17, 2005 5:25 pm
Location: USA

Post by us1aslam1us »

Do a search on the exact terms. There are many related posts.
I haven't failed, I've found 10,000 ways that don't work.
Thomas Alva Edison(1847-1931)
ajay.prakash03
Participant
Posts: 54
Joined: Mon May 28, 2007 12:41 am
Location: Chennai

hi

Post by ajay.prakash03 »

Hi

bydefault Database takes normal looup. but for sparsh lookup if u input data is less than ur lookup data then u have to do sparslookup.
normal lookup u know....
boppanakrishna
Participant
Posts: 106
Joined: Thu Jul 27, 2006 10:05 pm
Location: Mumbai

Post by boppanakrishna »

hi,
If you have a scenario where the reference data is larger than the input data then you shoud opt for sparse lookup, else normal look up


Regards,
Boppana
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

A normal lookup is performed against a virtual Data Set associated with the reference input link. A hash table (index) is built "on the fly" so that keys can be found quickly in the virtual Data Set. Therefore a normal lookup shows as a composite operator in the score; one to create the lookup table (LUT), the other to use it. If there is a Lookup File Set on the other end of the reference input link, the hash table (LUT) has been pre-build and is stored in the Lookup File Set.

A sparse lookup executes queries directly against the database table specified in the stage producing into the reference input link. This tends to be substantially slower than a normal lookup, but is useful where the reference data set woud not fit into memory. However, depending on the circumstances, a Join stage may be preferable to a sparse Lookup.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
JoshGeorge
Participant
Posts: 612
Joined: Thu May 03, 2007 4:59 am
Location: Melbourne

Post by JoshGeorge »

Normal will provide data for an in-memory look up whereas Sparse will access the database directly. Normal might provide poor performance when the reference data is huge as it has to load large data into memory. In such scenarios a join stage may work better(It will put the data onto an internal dataset file for each link, and then do the join based on the data provided). If your stream data is small then a sparse lookup is suggested as a row is retrieved from the database for each input row instead of the full reference data being brought into memory. Sparse lookup sends individual SQL statement for every incoming row (If stream data is huge you can imagine the number of times it has to hit DB and hence the down side on performance). It can be used when you want to get the next sequence number from your database (Again expensive overhead on your job as noted before). Also note that sparse lookup is only available for DB2 and Oracle. Normal Lookup stage can have multiple reference link but Sparse can only have one reference link. When normal is used it is a good practice to choose less volume data as reference data (Try reversing stream and reference for this if you can).
Joshy George
<a href="http://www.linkedin.com/in/joshygeorge1" ><img src="http://www.linkedin.com/img/webpromo/bt ... _80x15.gif" width="80" height="15" border="0"></a>
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

JoshGeorge wrote:Normal will provide data for an in-memory look up whereas Sparse will access the database directly. Normal might provide poor performance when the reference data is huge as it has to load large data into memory. In such scenarios a join stage may work better(It will put the data onto an internal dataset file for each link, and then do the join based on the data provided). If your stream data is small then a sparse lookup is suggested as a row is retrieved from the database for each input row instead of the full reference data being brought into memory. Sparse lookup sends individual SQL statement for every incoming row (If stream data is huge you can imagine the number of times it has to hit DB and hence the down side on performance). It can be used when you want to get the next sequence number from your database (Again expensive overhead on your job as noted before). Also note that sparse lookup is only available for DB2 and Oracle. Normal Lookup stage can have multiple reference link but Sparse can only have one reference link. When normal is used it is a good practice to choose less volume data as reference data (Try reversing stream and reference for this if you can).

just for clarification

use normal lookup when reference database is small to fit in the memory;

use sparse lookup when reference database is huge i.e ratio of input data and reference data is 1:100 or more

if all conditions above doesn't satisfy, its better to go for join stage
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
keshav0307
Premium Member
Premium Member
Posts: 783
Joined: Mon Jan 16, 2006 10:17 pm
Location: Sydney, Australia

Post by keshav0307 »

if both streams are huge then neither join nor lookup(normal or sparse) gives performance; better to put the main stream into a temporary table, join with the reference table at the database and extract values
Post Reply