Lookup Performance

bapajju · Post by **bapajju** » Tue Jan 13, 2004 4:36 am

Hi,
I tried a lookup using a hash-stage for 5000 records. The same lookup I tried through OraOCI stage and both of them effectively took the same time. But While tried the same sort of experiment with 4000000 records I found that Hash-Stage is far far faster than the OraOCI.
Just wanted to know if there is any Threshold limit in terms of number of records or number bytes for look up from where the difference between the performance of Hash-Stage lookup and general OraOCI lookup can be observed.

Thanks

WoMaWil · Post by **WoMaWil** » Tue Jan 13, 2004 5:35 am

Hi Bapu,

rules cannot be given. Even with 4 Mio. records there are scenarios to prefer OCI over Hash and vice versa. Only rules of thumb can be given.

Besides that you can icrease performance of Hash-Files by choosing an more optimal Hash-File-Type.

Wolfgang

kcbland · Post by **kcbland** » Tue Jan 13, 2004 6:36 am

A one row hash file should be a one row table every time for a reference lookup. A hash file is a local reference object on the server, a table suffers network overhead plus query preparation plus results fetch plus spool. A hash file can preload to memory making every direct read come from memory, whereas a table has only query cache to assist it. A hash file does not suffer from many instances of the same job hitting the reference lookup (in memory, and/or shared in memory is no burden) whereas many instances doing a database refence lookup will congest around the database.

Unless you're doing a real-time time of solution, for batch processing you should try to use hash lookups. You'll find out that the same table being used in multiple jobs for reference purposes could easily be put into a hash file that all jobs can share. You'll hit the database once to unload the table and after that all usage is local to the server. That should be the preference in your architecture, to remove outside influences (databases) from your transformation.

bapajju · Post by **bapajju** » Mon Jan 19, 2004 3:40 am

Thanks Bland:).I got a fair idea.

kcbland wrote:A one row hash file should be a one row table every time for a reference lookup. A hash file is a local reference object on the server, a table suffers network overhead plus query preparation plus results fetch plus spool. A hash file can preload to memory making every direct read come from memory, whereas a table has only query cache to assist it. A hash file does not suffer from many instances of the same job hitting the reference lookup (in memory, and/or shared in memory is no burden) whereas many instances doing a database refence lookup will congest around the database.

Unless you're doing a real-time time of solution, for batch processing you should try to use hash lookups. You'll find out that the same table being used in multiple jobs for reference purposes could easily be put into a hash file that all jobs can share. You'll hit the database once to unload the table and after that all usage is local to the server. That should be the preference in your architecture, to remove outside influences (databases) from your transformation.