Lookup fileset vs Dataset

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
koolnitz
Participant
Posts: 138
Joined: Wed Sep 07, 2005 5:39 am

Lookup fileset vs Dataset

Post by koolnitz »

Hi,

I have few queries related to filesets:

1. How does a lookup fileset differ from fileset?
2. How does DS stores lookup fileset which differs from storing a dataset?
3. What if I use dataset in place of lookup fileset for lookup?
4. Any other significant difference between the two?

Thanks in advance!
Nitin Jain | India

If everything seems to be going well, you have obviously overlooked something.
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

I did some investigation of lookup fileset versus a dataset which I posted to my blog parallel lookup types. On larger lookups the lookup fileset can be used immediately, other types of lookup need to be loaded into a temporary lookup fileset which is removed after the job finishes.
koolnitz
Participant
Posts: 138
Joined: Wed Sep 07, 2005 5:39 am

Post by koolnitz »

On larger lookups the lookup fileset can be used immediately, other types of lookup need to be loaded into a temporary lookup fileset which is removed after the job finishes.
Why does DS load other types of lookup into temporary lookup fileset?
Can't it directly load the data into memory where it can perform lookup operation? Anyways, it flushes out the memory once the job finishes.
Nitin Jain | India

If everything seems to be going well, you have obviously overlooked something.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

koolnitz wrote:Why does DS load other types of lookup into temporary lookup fileset?
It does not load other types of lookup into a temporary Lookup File Set - this was a conceptual explanation. If you do not have an explicit Lookup File Set in your design, and you are not performing a sparse lookup, then the external data are loaded into a virtual Data Set (in memory) against which lookups are performed. You can see that this occurs by inspecting the generated OSH.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
koolnitz
Participant
Posts: 138
Joined: Wed Sep 07, 2005 5:39 am

Post by koolnitz »

Another question arising in my mind:

I have a job which uses a lookup fileset for lookup. If I run 5 instances of the same job simultaneously, will those instances use the same lookup fileset fetched into the memory OR will each job bring its own copy of fileset into memory?
Nitin Jain | India

If everything seems to be going well, you have obviously overlooked something.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Each process loads into local virtual memory, not into shared global memory so they will each have their "own" copy in memory.
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Lookup file sets are comparitively slower than the Datasets.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
amsh76
Charter Member
Charter Member
Posts: 118
Joined: Wed Mar 10, 2004 10:58 pm

Post by amsh76 »

Two issues I always face with Lookup Fileset:
1. How to view the data, that is loaded in the Lkp File Set.
2. You can not append data

Secondly, unlike Hash File Lookup..the lookup stage lacks the capability of showing only the number of records that found match with the source, not sure if anything is done to take care off this issue. This makes auditing difficult.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Lookup File Set is accessed from disk. Only its index is loaded into memory, as far as I am aware.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply