Hi,
I tried a lookup using a hash-stage for 5000 records. The same lookup I tried through OraOCI stage and both of them effectively took the same time. But While tried the same sort of experiment with 4000000 records I found that Hash-Stage is far far faster than the OraOCI.
Just wanted to know if there is any Threshold limit in terms of number of records or number bytes for look up from where the difference between the performance of Hash-Stage lookup and general OraOCI lookup can be observed.
A one row hash file should be a one row table every time for a reference lookup. A hash file is a local reference object on the server, a table suffers network overhead plus query preparation plus results fetch plus spool. A hash file can preload to memory making every direct read come from memory, whereas a table has only query cache to assist it. A hash file does not suffer from many instances of the same job hitting the reference lookup (in memory, and/or shared in memory is no burden) whereas many instances doing a database refence lookup will congest around the database.
Unless you're doing a real-time time of solution, for batch processing you should try to use hash lookups. You'll find out that the same table being used in multiple jobs for reference purposes could easily be put into a hash file that all jobs can share. You'll hit the database once to unload the table and after that all usage is local to the server. That should be the preference in your architecture, to remove outside influences (databases) from your transformation.
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
kcbland wrote:A one row hash file should be a one row table every time for a reference lookup. A hash file is a local reference object on the server, a table suffers network overhead plus query preparation plus results fetch plus spool. A hash file can preload to memory making every direct read come from memory, whereas a table has only query cache to assist it. A hash file does not suffer from many instances of the same job hitting the reference lookup (in memory, and/or shared in memory is no burden) whereas many instances doing a database refence lookup will congest around the database.
Unless you're doing a real-time time of solution, for batch processing you should try to use hash lookups. You'll find out that the same table being used in multiple jobs for reference purposes could easily be put into a hash file that all jobs can share. You'll hit the database once to unload the table and after that all usage is local to the server. That should be the preference in your architecture, to remove outside influences (databases) from your transformation.