Look up file set Vs Actual Database lookup

yuva010 · Post by **yuva010** » Mon Jun 09, 2008 4:41 am

Hi,
We need to look up tables from Oracle, there are two methods -
1. Either we should use direct database table to look up and fetch
2. Load the look up data set files from table and Look up on these files to fetch.

Which one is better?

ArndW · Post by **ArndW** » Mon Jun 09, 2008 4:55 am

You need to define "better". This could mean many things - for instance quickest runtime, or least impact on the database, or using the least system resources, etc.
How big is the table and how big is the source data in comparison? How often does the table change?

yuva010 · Post by **yuva010** » Mon Jun 09, 2008 5:14 am

We are interested in Quick runtime and least resource utilization.
There are different look up tables for diff jobs.
Few tables has millions of records others has very few records for look up.
Source data is always 1 million or so.

swapnilverma · Post by **swapnilverma** » Mon Jun 09, 2008 6:18 am

Its based on data volume of input VS reference and the your system configuation.

If input volume is significantly smaller then reference table then go far sparce lookup. ( one SQL query for one record )

if both input are same then go and still the volume is not huge ( based on ur system configuration and apt used) go for direct table lookup.

If both Input and reference are coming from TABLE then you can go for SQL join as well.

Else use lookup file set or JOIN stage ...

You Decide

ArndW · Post by **ArndW** » Mon Jun 09, 2008 6:36 am

If your lookup data changes once a month, then it makes sense to create a lookup fileset and use that on each run. If the reference data changes daily or each run of the job you can save the resources required to build the lookup and do it directly. This is just one of several choices you can make.

The goals of minimizing runtime as well as minimizing resource use are usually diametrically opposed. That is similar to expect MPG use on a car to get better with faster speeds - it doesn't work that way.

Mimizing resource use per row processed is a possible goal, but won't necessarily result in maximizing throughput.

vmcburney · Post by **vmcburney** » Mon Jun 09, 2008 6:02 pm

My recollection, when I had a look at the files getting created when the job runs, is that the job runs the same way using both methods but the job startup will be different. If you already have a lookup fileset the job will start processing straight away. If you don't have a lookup fileset and the lookup data cannot all fit into RAM memory the job will create the equivalent of lookup filesets for you in the node scratch space. So for very large lookups either way you get a lookup fileset - it just depends on whether you want to create it every time you run the job or create it less often to a schedule.

yuva010 · Post by **yuva010** » Tue Jun 10, 2008 1:04 am

Hi,
The decision I have to make is - for least runtime and saving utilization of resources - based on volumn of Input Volumn and Look up volumn -

Source/IP Volumn Lookup Volumn
Low Moderate Large
Low Direct table Fileset or DT? Fileset or DT?
Moderate
Large