Lookup direct from Oracle database

nick.bond · Post by **nick.bond** » Wed Jun 14, 2006 10:32 pm

When I have a lookup stage with reference link direct to Oracle stage what exactly happens in the background?

Does this simply create a temporay lookup fileset which is then loaded into memory in "entire" mode on all nodes? or Does it create partitioned lookup filesets? Or does it not really create lookup filesets at all?

If a lookup fileset is only required for one job (MainJob) and will be recreated just before the each run of MainJob, will I get the same performance by having a separate job that loads the lookup fileset and then MainJob uses that lookup fileset compared to the easyier implementation of having the lookup stage connected directly to the Oracle stage?

kumar_s · Post by **kumar_s** » Wed Jun 14, 2006 11:17 pm

If you reference oracle stage directly, it doesnt create any fileset, (hope you are not talking about virtual dataset). Search for Sparse lookup data will get looked up directly from the database.

ray.wurlod · Post by **ray.wurlod** » Wed Jun 14, 2006 11:52 pm

Without a sparse lookup the entire table is loaded into a virtual Data Set against which lookups are performed.

nick.bond · Post by **nick.bond** » Thu Jun 15, 2006 12:44 am

i guess i was talking about the virtual dataset.

Forgetting the overhead in creating this virtual dataset or the overhead creating the lookup fileset, once you had them is there a large performance differnece between performing lookups on the virtual dataset rather than the lookup fileset?

1) With small amounts of lookup data < 100,000 rows
2) With large volumes in the lookup data > 10,000,000

Is the physical implementation of the oracle stage/virtual dataset and the lookup stage very different from the lookup fileset and the lookup stage?

nick.bond · Post by **nick.bond** » Thu Jun 15, 2006 12:46 am

Is the physical implementation of the oracle stage/virtual dataset and the lookup stage very different from the lookup fileset and the lookup stage?

For the above I mean in terms of what happens when the job runs not how one implements this in the job.

ray.wurlod · Post by **ray.wurlod** » Thu Jun 15, 2006 2:10 am

It's more to do with the number of distinct values that are likely to be looked up. If this is "small" (they have never quantified what this means), then a sparse lookup is likely to be beneficial.

kumar_s · Post by **kumar_s** » Thu Jun 15, 2006 4:49 am

Also depends on the database conectivity, it is costly to fetch value for each record, it is considrable to have no sparse lookup. Database available in tele network will be an example.