Page 1 of 1

Lookup file set best place to use it

Posted: Tue Apr 13, 2010 4:10 am
by myukassign
In one of Rays post I saw Dataset is the best stage to use if I want to store data for a reference link and the data volume is huge.

Then practically this lookup file set is designed to use at which place. Can anyone give me a scenario where this stage do the best fit.

I am in to designing these days and sorry for asking this question as this stage leave me in a confused state.

Re: Lookup file set best place to use it

Posted: Tue Apr 13, 2010 4:14 am
by madhukar
Lookup fileset to be used, if reference data fits into memory and used in multiple jobs

Posted: Tue Apr 13, 2010 4:43 am
by srinivas.g
Spare lookup better to use lookup file set

Posted: Tue Apr 13, 2010 4:45 am
by srinivas.g
Spare lookup better to use lookup file set

Posted: Tue Apr 13, 2010 5:22 am
by ray.wurlod
I think you must have misinterpreted what I said.

The only place you can use a Lookup File Set is to service the reference input to a Lookup stage. The benefit you get is that the index on the lookup key is pre-built - it does not have to be build "on the fly" by the LUT_CreateOp operator.

And, yes, the total volume of reference data must still fit into memory.

Data Sets are really more about staging data between parallel jobs, rather than servicing lookups. However, it would be an interesting exercise to determine whether the cost of the import operator associated with a Lookup File Set would be greater or less than the cost of the LUT_CreateOp operator building the index if a Data Set were used.

Posted: Wed Apr 14, 2010 11:53 pm
by myukassign
In my experience.. lookup fileset did a better job than the dataset....

I stand for Lookup file set than DS.

Thanks for your reply. I m closing this thread.