Sparse Lookup

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
sshettar
Premium Member
Premium Member
Posts: 264
Joined: Thu Nov 30, 2006 10:37 am

Sparse Lookup

Post by sshettar »

Hi All,

I have this senario where i am getting the updated file for the month from the source . I had to check if the updated source file had any records which already existed in the table that i am going to load cause i need to reject those records and pass the once that do not exist into the table.
Hence i though abt going for a lookup with the reject link , where the output link serve as the records to be rejected and the reject link data are the once to be inserted into the table.
Since i had to do a lookup against the entire table and since the table would eventually grow big i decided to to sparse lookup.
The job design is as followd
Table(with sparse lookup)
.
.
.
. O/p Link
Updated Dataset............Lookup stage......................Output file(records
. that existed)
.
. Reject link
.
Dataset(records that needs to be inserted
into the table)

i am doing a lookup against 3 fields.
Here i am facing problem as when i tried doing the same job with Normal lookup . Its doing fine ( although right now we have processed only like 3 months of data init) but eventually this table will defenetly grow as we start processing more months.
But when i used Sparse lookup i see that the output link is getting more records than actual records in the main link(ie updated dataset) . I dont know how it is getting those many records when the input link is showing only few records and the referenece link only few records.
for example when the i/p and ref links are showing just 12 rows i see that the o/p link is showing 216654 rows.
I am suppose to get the same number of records as the updated dataset or fewer.
The metadata of Updated dataset and the table are exactly the same .

The whole purpose of doing this job is that our job failed once due to duplicate records( primary key integrity voilation)
So we opted this method.

Can somebody please advice where i'm going wrong in setting the sparse lookup plz

Any help is highly appreciated

Thanks
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

if you are getting more records in output then the first quostion will be, Are you defining correct keys for lookup?
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
nagarjuna
Premium Member
Premium Member
Posts: 533
Joined: Fri Jun 27, 2008 9:11 pm
Location: Chicago

Post by nagarjuna »

For this type of scenario , you can go with change capture stage or slowly dimension stage .

How many maximum rows your input link can have ?Because too many rows in the sparse lookup will lead to performance hit .
Nag
ddevdutt
Participant
Posts: 47
Joined: Wed Aug 22, 2007 2:38 pm

Post by ddevdutt »

I think you can do it using change capture stage or SCD stage like nagarjuna mentioned.

As for sparse lookup, have you mapped the key columns correctly?
DD

Success is right around the corner
platonfi
Premium Member
Premium Member
Posts: 27
Joined: Mon May 26, 2008 1:39 am

Re: Sparse Lookup

Post by platonfi »

sshettar wrote: ...
But when i used Sparse lookup i see that the output link is getting more records than actual records in the main link(ie updated dataset) . I dont know how it is getting those many records when the input link is showing only few records and the referenece link only few records.
for example when the i/p and ref links are showing just 12 rows i see that the o/p link is showing 216654 rows.
...
We are have exatly the same phenomena here without any explanation why this is happening. Could it be a bug with the sparse lookup?
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

Please start a new thread - even though they may be similar.

Also provide your issue in detail with your findings. i.e. does the reference has multiple values for the same keys compared ?
Post Reply