difference between normal and sparse lookup

ramesh_c · Post by **ramesh_c** » Mon Jun 11, 2007 12:50 am

Hi All,

I am new to datastage. I want to know the diffrence between the normal lookup and sparse lookup. In which scenarios we will use the normal lookup and sparse lookup.How internally the datastage execution will be done for these two types of lookups.

Thanks in advance,
Ramesh

us1aslam1us · Post by **us1aslam1us** » Mon Jun 11, 2007 1:55 am

Do a search on the exact terms. There are many related posts.

ajay.prakash03 · Post by **ajay.prakash03** » Mon Jun 11, 2007 2:57 am

Hi

bydefault Database takes normal looup. but for sparsh lookup if u input data is less than ur lookup data then u have to do sparslookup.
normal lookup u know....

boppanakrishna · Post by **boppanakrishna** » Mon Jun 11, 2007 4:48 am

hi,
If you have a scenario where the reference data is larger than the input data then you shoud opt for sparse lookup, else normal look up

Regards,
Boppana

ray.wurlod · Post by **ray.wurlod** » Mon Jun 11, 2007 5:25 am

A normal lookup is performed against a virtual Data Set associated with the reference input link. A hash table (index) is built "on the fly" so that keys can be found quickly in the virtual Data Set. Therefore a normal lookup shows as a composite operator in the score; one to create the lookup table (LUT), the other to use it. If there is a Lookup File Set on the other end of the reference input link, the hash table (LUT) has been pre-build and is stored in the Lookup File Set.

A sparse lookup executes queries directly against the database table specified in the stage producing into the reference input link. This tends to be substantially slower than a normal lookup, but is useful where the reference data set woud not fit into memory. However, depending on the circumstances, a Join stage may be preferable to a sparse Lookup.

JoshGeorge · Post by **JoshGeorge** » Mon Jun 11, 2007 7:48 am

Normal will provide data for an in-memory look up whereas Sparse will access the database directly. Normal might provide poor performance when the reference data is huge as it has to load large data into memory. In such scenarios a join stage may work better(It will put the data onto an internal dataset file for each link, and then do the join based on the data provided). If your stream data is small then a sparse lookup is suggested as a row is retrieved from the database for each input row instead of the full reference data being brought into memory. Sparse lookup sends individual SQL statement for every incoming row (If stream data is huge you can imagine the number of times it has to hit DB and hence the down side on performance). It can be used when you want to get the next sequence number from your database (Again expensive overhead on your job as noted before). Also note that sparse lookup is only available for DB2 and Oracle. Normal Lookup stage can have multiple reference link but Sparse can only have one reference link. When normal is used it is a good practice to choose less volume data as reference data (Try reversing stream and reference for this if you can).

priyadarshikunal · Post by **priyadarshikunal** » Tue Jun 12, 2007 4:17 am

JoshGeorge wrote:Normal will provide data for an in-memory look up whereas Sparse will access the database directly. Normal might provide poor performance when the reference data is huge as it has to load large data into memory. In such scenarios a join stage may work better(It will put the data onto an internal dataset file for each link, and then do the join based on the data provided). If your stream data is small then a sparse lookup is suggested as a row is retrieved from the database for each input row instead of the full reference data being brought into memory. Sparse lookup sends individual SQL statement for every incoming row (If stream data is huge you can imagine the number of times it has to hit DB and hence the down side on performance). It can be used when you want to get the next sequence number from your database (Again expensive overhead on your job as noted before). Also note that sparse lookup is only available for DB2 and Oracle. Normal Lookup stage can have multiple reference link but Sparse can only have one reference link. When normal is used it is a good practice to choose less volume data as reference data (Try reversing stream and reference for this if you can).

just for clarification

use normal lookup when reference database is small to fit in the memory;

use sparse lookup when reference database is huge i.e ratio of input data and reference data is 1:100 or more

if all conditions above doesn't satisfy, its better to go for join stage

keshav0307 · Post by **keshav0307** » Tue Jun 12, 2007 11:08 am

if both streams are huge then neither join nor lookup(normal or sparse) gives performance; better to put the main stream into a temporary table, join with the reference table at the database and extract values