Page 1 of 1

Lookup Fileset,Fileset or Dataset for Reference

Posted: Thu Feb 24, 2011 4:51 am
by clarcombe
Dear esteemed colleagues,

Working at a new site where they are using Lookup Filesets with Lookup stage.

I have never used these as I have always preferred to use datasets (or they have been imposed by previous incumbent).

Having read various responses about lookup filesets, filesets and datasets I wondered if anyone has a definitive table of what cases they should be used for and their respective advantages/disadvantages.

Thanks

Colin

Posted: Thu Feb 24, 2011 4:57 am
by Shaanpriya
Lookup fileset are preferred as
1) Indexing details of columns are preserved
2) The lookup column details are also stored.

Posted: Tue Jul 17, 2012 1:38 am
by clarcombe
I have discovered the use of lookup filesets, the fact being that the fileset is present across all nodes automatically.

Posted: Tue Jul 17, 2012 7:00 pm
by ray.wurlod
When you use a Lookup stage the reference input is loaded into memory and an index is created on its defined key on the fly.

Except that, if a Lookup File Set is used, it has the index created within its structure at the time that it is created/populated. That means that the cost of creating the index is time shifted away from the main job run.

It also explains why View Data is not available for Lookup File Set; it is not set up for streaming rows, it is only set up for key-based access (that is, lookups).

Posted: Wed Jul 18, 2012 5:02 am
by clarcombe
ray.wurlod wrote: It also explains why View Data is not available for Lookup File Set; it is not set up for streaming rows, it is only set up for key-based access (that is, lookups).
Ah, I wondered why I couldnt see the data. Thanks Ray

Posted: Wed Jul 18, 2012 5:09 am
by ArndW
A couple of years ago I did some comparitive testing for performance differences between lookup filesets and datasets and found that both performed with almost the same speed. Since lookup fileset were (and remain) black-boxes with no facility to view the data I chose to stick with using datasets even when I knew that they would mainly be used for lookups - the limitations imposed by the lookup fileset outweighed any performance benefits.

Unless performance has changed in the interim I'll probably stick with datasets for the time being.

Posted: Wed Jul 18, 2012 5:35 am
by clarcombe
That's interesting Arnd. Did you compare increasing volume sizes too ?

Posted: Wed Jul 18, 2012 5:49 am
by ArndW
Yes, I did - I can't recall the volumes, but it was on a big AIX box with a fast SAN and I went up to a lot of Mb. I think I made the sizes such that the jobs ran at least 10 minutes so I could get a good signal-to-noise ratio and consistent results.

Posted: Wed Jul 18, 2012 5:55 am
by clarcombe
I have just finished a job working with a consultant and he swore by lookup filesets so I had to change my job to work with them!!

But its good to know that I have a reference now in case I get hit with the same question.

Thanks

Posted: Wed Jul 18, 2012 6:08 pm
by ray.wurlod
It's always a compromise. Data Sets don't need to invoke import operator; they work with copy. I believe Lookup File Sets do need the overhead of import (for the data, not the index). So, with Data Sets you need to build the index, with Lookup File Sets you have to translate the data. Arnd's results suggest that the costs are roughly equivalent.