Lookup Fileset,Fileset or Dataset for Reference

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
clarcombe
Premium Member
Premium Member
Posts: 515
Joined: Wed Jun 08, 2005 9:54 am
Location: Europe

Lookup Fileset,Fileset or Dataset for Reference

Post by clarcombe »

Dear esteemed colleagues,

Working at a new site where they are using Lookup Filesets with Lookup stage.

I have never used these as I have always preferred to use datasets (or they have been imposed by previous incumbent).

Having read various responses about lookup filesets, filesets and datasets I wondered if anyone has a definitive table of what cases they should be used for and their respective advantages/disadvantages.

Thanks

Colin
Colin Larcombe
-------------------

Certified IBM Infosphere Datastage Developer
Shaanpriya
Participant
Posts: 22
Joined: Thu Sep 11, 2008 11:47 pm
Location: Bangalore

Post by Shaanpriya »

Lookup fileset are preferred as
1) Indexing details of columns are preserved
2) The lookup column details are also stored.
clarcombe
Premium Member
Premium Member
Posts: 515
Joined: Wed Jun 08, 2005 9:54 am
Location: Europe

Post by clarcombe »

I have discovered the use of lookup filesets, the fact being that the fileset is present across all nodes automatically.
Colin Larcombe
-------------------

Certified IBM Infosphere Datastage Developer
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

When you use a Lookup stage the reference input is loaded into memory and an index is created on its defined key on the fly.

Except that, if a Lookup File Set is used, it has the index created within its structure at the time that it is created/populated. That means that the cost of creating the index is time shifted away from the main job run.

It also explains why View Data is not available for Lookup File Set; it is not set up for streaming rows, it is only set up for key-based access (that is, lookups).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
clarcombe
Premium Member
Premium Member
Posts: 515
Joined: Wed Jun 08, 2005 9:54 am
Location: Europe

Post by clarcombe »

ray.wurlod wrote: It also explains why View Data is not available for Lookup File Set; it is not set up for streaming rows, it is only set up for key-based access (that is, lookups).
Ah, I wondered why I couldnt see the data. Thanks Ray
Colin Larcombe
-------------------

Certified IBM Infosphere Datastage Developer
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

A couple of years ago I did some comparitive testing for performance differences between lookup filesets and datasets and found that both performed with almost the same speed. Since lookup fileset were (and remain) black-boxes with no facility to view the data I chose to stick with using datasets even when I knew that they would mainly be used for lookups - the limitations imposed by the lookup fileset outweighed any performance benefits.

Unless performance has changed in the interim I'll probably stick with datasets for the time being.
clarcombe
Premium Member
Premium Member
Posts: 515
Joined: Wed Jun 08, 2005 9:54 am
Location: Europe

Post by clarcombe »

That's interesting Arnd. Did you compare increasing volume sizes too ?
Colin Larcombe
-------------------

Certified IBM Infosphere Datastage Developer
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Yes, I did - I can't recall the volumes, but it was on a big AIX box with a fast SAN and I went up to a lot of Mb. I think I made the sizes such that the jobs ran at least 10 minutes so I could get a good signal-to-noise ratio and consistent results.
clarcombe
Premium Member
Premium Member
Posts: 515
Joined: Wed Jun 08, 2005 9:54 am
Location: Europe

Post by clarcombe »

I have just finished a job working with a consultant and he swore by lookup filesets so I had to change my job to work with them!!

But its good to know that I have a reference now in case I get hit with the same question.

Thanks
Colin Larcombe
-------------------

Certified IBM Infosphere Datastage Developer
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

It's always a compromise. Data Sets don't need to invoke import operator; they work with copy. I believe Lookup File Sets do need the overhead of import (for the data, not the index). So, with Data Sets you need to build the index, with Lookup File Sets you have to translate the data. Arnd's results suggest that the costs are roughly equivalent.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply