Lookup Fileset,Fileset or Dataset for Reference
Moderators: chulett, rschirm, roy
Lookup Fileset,Fileset or Dataset for Reference
Dear esteemed colleagues,
Working at a new site where they are using Lookup Filesets with Lookup stage.
I have never used these as I have always preferred to use datasets (or they have been imposed by previous incumbent).
Having read various responses about lookup filesets, filesets and datasets I wondered if anyone has a definitive table of what cases they should be used for and their respective advantages/disadvantages.
Thanks
Colin
Working at a new site where they are using Lookup Filesets with Lookup stage.
I have never used these as I have always preferred to use datasets (or they have been imposed by previous incumbent).
Having read various responses about lookup filesets, filesets and datasets I wondered if anyone has a definitive table of what cases they should be used for and their respective advantages/disadvantages.
Thanks
Colin
Colin Larcombe
-------------------
Certified IBM Infosphere Datastage Developer
-------------------
Certified IBM Infosphere Datastage Developer
-
- Participant
- Posts: 22
- Joined: Thu Sep 11, 2008 11:47 pm
- Location: Bangalore
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
When you use a Lookup stage the reference input is loaded into memory and an index is created on its defined key on the fly.
Except that, if a Lookup File Set is used, it has the index created within its structure at the time that it is created/populated. That means that the cost of creating the index is time shifted away from the main job run.
It also explains why View Data is not available for Lookup File Set; it is not set up for streaming rows, it is only set up for key-based access (that is, lookups).
Except that, if a Lookup File Set is used, it has the index created within its structure at the time that it is created/populated. That means that the cost of creating the index is time shifted away from the main job run.
It also explains why View Data is not available for Lookup File Set; it is not set up for streaming rows, it is only set up for key-based access (that is, lookups).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Ah, I wondered why I couldnt see the data. Thanks Rayray.wurlod wrote: It also explains why View Data is not available for Lookup File Set; it is not set up for streaming rows, it is only set up for key-based access (that is, lookups).
Colin Larcombe
-------------------
Certified IBM Infosphere Datastage Developer
-------------------
Certified IBM Infosphere Datastage Developer
A couple of years ago I did some comparitive testing for performance differences between lookup filesets and datasets and found that both performed with almost the same speed. Since lookup fileset were (and remain) black-boxes with no facility to view the data I chose to stick with using datasets even when I knew that they would mainly be used for lookups - the limitations imposed by the lookup fileset outweighed any performance benefits.
Unless performance has changed in the interim I'll probably stick with datasets for the time being.
Unless performance has changed in the interim I'll probably stick with datasets for the time being.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
Yes, I did - I can't recall the volumes, but it was on a big AIX box with a fast SAN and I went up to a lot of Mb. I think I made the sizes such that the jobs ran at least 10 minutes so I could get a good signal-to-noise ratio and consistent results.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
I have just finished a job working with a consultant and he swore by lookup filesets so I had to change my job to work with them!!
But its good to know that I have a reference now in case I get hit with the same question.
Thanks
But its good to know that I have a reference now in case I get hit with the same question.
Thanks
Colin Larcombe
-------------------
Certified IBM Infosphere Datastage Developer
-------------------
Certified IBM Infosphere Datastage Developer
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
It's always a compromise. Data Sets don't need to invoke import operator; they work with copy. I believe Lookup File Sets do need the overhead of import (for the data, not the index). So, with Data Sets you need to build the index, with Lookup File Sets you have to translate the data. Arnd's results suggest that the costs are roughly equivalent.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.