Page 1 of 1

Lookup fileset vs Dataset

Posted: Mon Apr 24, 2006 4:33 am
by koolnitz
Hi,

I have few queries related to filesets:

1. How does a lookup fileset differ from fileset?
2. How does DS stores lookup fileset which differs from storing a dataset?
3. What if I use dataset in place of lookup fileset for lookup?
4. Any other significant difference between the two?

Thanks in advance!

Posted: Mon Apr 24, 2006 6:51 am
by vmcburney
I did some investigation of lookup fileset versus a dataset which I posted to my blog parallel lookup types. On larger lookups the lookup fileset can be used immediately, other types of lookup need to be loaded into a temporary lookup fileset which is removed after the job finishes.

Posted: Mon Apr 24, 2006 7:45 am
by koolnitz
On larger lookups the lookup fileset can be used immediately, other types of lookup need to be loaded into a temporary lookup fileset which is removed after the job finishes.
Why does DS load other types of lookup into temporary lookup fileset?
Can't it directly load the data into memory where it can perform lookup operation? Anyways, it flushes out the memory once the job finishes.

Posted: Mon Apr 24, 2006 4:18 pm
by ray.wurlod
koolnitz wrote:Why does DS load other types of lookup into temporary lookup fileset?
It does not load other types of lookup into a temporary Lookup File Set - this was a conceptual explanation. If you do not have an explicit Lookup File Set in your design, and you are not performing a sparse lookup, then the external data are loaded into a virtual Data Set (in memory) against which lookups are performed. You can see that this occurs by inspecting the generated OSH.

Posted: Tue Apr 25, 2006 8:21 am
by koolnitz
Another question arising in my mind:

I have a job which uses a lookup fileset for lookup. If I run 5 instances of the same job simultaneously, will those instances use the same lookup fileset fetched into the memory OR will each job bring its own copy of fileset into memory?

Posted: Tue Apr 25, 2006 8:25 am
by ArndW
Each process loads into local virtual memory, not into shared global memory so they will each have their "own" copy in memory.

Posted: Tue Apr 25, 2006 8:45 am
by DSguru2B
Lookup file sets are comparitively slower than the Datasets.

Posted: Tue Apr 25, 2006 9:07 am
by amsh76
Two issues I always face with Lookup Fileset:
1. How to view the data, that is loaded in the Lkp File Set.
2. You can not append data

Secondly, unlike Hash File Lookup..the lookup stage lacks the capability of showing only the number of records that found match with the source, not sure if anything is done to take care off this issue. This makes auditing difficult.

Posted: Tue Apr 25, 2006 5:06 pm
by ray.wurlod
Lookup File Set is accessed from disk. Only its index is loaded into memory, as far as I am aware.