Sparse Lookup

sshettar · Post by **sshettar** » Fri Jun 12, 2009 8:57 am

Hi All,

I have this senario where i am getting the updated file for the month from the source . I had to check if the updated source file had any records which already existed in the table that i am going to load cause i need to reject those records and pass the once that do not exist into the table.
Hence i though abt going for a lookup with the reject link , where the output link serve as the records to be rejected and the reject link data are the once to be inserted into the table.
Since i had to do a lookup against the entire table and since the table would eventually grow big i decided to to sparse lookup.
The job design is as followd
Table(with sparse lookup)
.
.
.
. O/p Link
Updated Dataset............Lookup stage......................Output file(records
. that existed)
.
. Reject link
.
Dataset(records that needs to be inserted
into the table)

i am doing a lookup against 3 fields.
Here i am facing problem as when i tried doing the same job with Normal lookup . Its doing fine ( although right now we have processed only like 3 months of data init) but eventually this table will defenetly grow as we start processing more months.
But when i used Sparse lookup i see that the output link is getting more records than actual records in the main link(ie updated dataset) . I dont know how it is getting those many records when the input link is showing only few records and the referenece link only few records.
for example when the i/p and ref links are showing just 12 rows i see that the o/p link is showing 216654 rows.
I am suppose to get the same number of records as the updated dataset or fewer.
The metadata of Updated dataset and the table are exactly the same .

The whole purpose of doing this job is that our job failed once due to duplicate records( primary key integrity voilation)
So we opted this method.

Can somebody please advice where i'm going wrong in setting the sparse lookup plz

Any help is highly appreciated

Thanks

priyadarshikunal · Post by **priyadarshikunal** » Fri Jun 12, 2009 9:09 am

if you are getting more records in output then the first quostion will be, Are you defining correct keys for lookup?

nagarjuna · Post by **nagarjuna** » Fri Jun 12, 2009 9:13 am

For this type of scenario , you can go with change capture stage or slowly dimension stage .

How many maximum rows your input link can have ?Because too many rows in the sparse lookup will lead to performance hit .

ddevdutt · Post by **ddevdutt** » Fri Jun 12, 2009 9:25 am

I think you can do it using change capture stage or SCD stage like nagarjuna mentioned.

As for sparse lookup, have you mapped the key columns correctly?

platonfi · Post by **platonfi** » Mon Nov 23, 2009 7:04 am

sshettar wrote: ...
But when i used Sparse lookup i see that the output link is getting more records than actual records in the main link(ie updated dataset) . I dont know how it is getting those many records when the input link is showing only few records and the referenece link only few records.
for example when the i/p and ref links are showing just 12 rows i see that the o/p link is showing 216654 rows.
...

We are have exatly the same phenomena here without any explanation why this is happening. Could it be a bug with the sparse lookup?

Sainath.Srinivasan · Post by **Sainath.Srinivasan** » Mon Nov 23, 2009 7:13 am

Please start a new thread - even though they may be similar.

Also provide your issue in detail with your findings. i.e. does the reference has multiple values for the same keys compared ?

DSXchange

Sparse Lookup

Sparse Lookup

Re: Sparse Lookup