Page 1 of 1

Look up file set Vs Actual Database lookup

Posted: Mon Jun 09, 2008 4:41 am
by yuva010
Hi,
We need to look up tables from Oracle, there are two methods -
1. Either we should use direct database table to look up and fetch
2. Load the look up data set files from table and Look up on these files to fetch.

Which one is better?

Posted: Mon Jun 09, 2008 4:55 am
by ArndW
You need to define "better". This could mean many things - for instance quickest runtime, or least impact on the database, or using the least system resources, etc.
How big is the table and how big is the source data in comparison? How often does the table change?

Posted: Mon Jun 09, 2008 5:14 am
by yuva010
We are interested in Quick runtime and least resource utilization.
There are different look up tables for diff jobs.
Few tables has millions of records others has very few records for look up.
Source data is always 1 million or so.

Posted: Mon Jun 09, 2008 6:18 am
by swapnilverma
Its based on data volume of input VS reference and the your system configuation.


If input volume is significantly smaller then reference table then go far sparce lookup. ( one SQL query for one record )

if both input are same then go and still the volume is not huge ( based on ur system configuration and apt used) go for direct table lookup.

If both Input and reference are coming from TABLE then you can go for SQL join as well.


Else use lookup file set or JOIN stage ...

You Decide 8) 8) 8)

Posted: Mon Jun 09, 2008 6:36 am
by ArndW
If your lookup data changes once a month, then it makes sense to create a lookup fileset and use that on each run. If the reference data changes daily or each run of the job you can save the resources required to build the lookup and do it directly. This is just one of several choices you can make.

The goals of minimizing runtime as well as minimizing resource use are usually diametrically opposed. That is similar to expect MPG use on a car to get better with faster speeds - it doesn't work that way.

Mimizing resource use per row processed is a possible goal, but won't necessarily result in maximizing throughput.

Posted: Mon Jun 09, 2008 6:02 pm
by vmcburney
My recollection, when I had a look at the files getting created when the job runs, is that the job runs the same way using both methods but the job startup will be different. If you already have a lookup fileset the job will start processing straight away. If you don't have a lookup fileset and the lookup data cannot all fit into RAM memory the job will create the equivalent of lookup filesets for you in the node scratch space. So for very large lookups either way you get a lookup fileset - it just depends on whether you want to create it every time you run the job or create it less often to a schedule.

Posted: Tue Jun 10, 2008 1:04 am
by yuva010
Hi,
The decision I have to make is - for least runtime and saving utilization of resources - based on volumn of Input Volumn and Look up volumn -

Source/IP Volumn Lookup Volumn
Low Moderate Large
Low Direct table Fileset or DT? Fileset or DT?
Moderate
Large