Lookup Performance

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
bapajju
Participant
Posts: 82
Joined: Wed Nov 19, 2003 11:58 pm

Lookup Performance

Post by bapajju »

Hi,
I tried a lookup using a hash-stage for 5000 records. The same lookup I tried through OraOCI stage and both of them effectively took the same time. But While tried the same sort of experiment with 4000000 records I found that Hash-Stage is far far faster than the OraOCI.
Just wanted to know if there is any Threshold limit in terms of number of records or number bytes for look up from where the difference between the performance of Hash-Stage lookup and general OraOCI lookup can be observed.

Thanks :)
WoMaWil
Participant
Posts: 482
Joined: Thu Mar 13, 2003 7:17 am
Location: Amsterdam

Post by WoMaWil »

Hi Bapu,

rules cannot be given. Even with 4 Mio. records there are scenarios to prefer OCI over Hash and vice versa. Only rules of thumb can be given.

Besides that you can icrease performance of Hash-Files by choosing an more optimal Hash-File-Type.

Wolfgang
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

A one row hash file should be a one row table every time for a reference lookup. A hash file is a local reference object on the server, a table suffers network overhead plus query preparation plus results fetch plus spool. A hash file can preload to memory making every direct read come from memory, whereas a table has only query cache to assist it. A hash file does not suffer from many instances of the same job hitting the reference lookup (in memory, and/or shared in memory is no burden) whereas many instances doing a database refence lookup will congest around the database.

Unless you're doing a real-time time of solution, for batch processing you should try to use hash lookups. You'll find out that the same table being used in multiple jobs for reference purposes could easily be put into a hash file that all jobs can share. You'll hit the database once to unload the table and after that all usage is local to the server. That should be the preference in your architecture, to remove outside influences (databases) from your transformation.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
bapajju
Participant
Posts: 82
Joined: Wed Nov 19, 2003 11:58 pm

Post by bapajju »

Thanks Bland:).I got a fair idea.
kcbland wrote:A one row hash file should be a one row table every time for a reference lookup. A hash file is a local reference object on the server, a table suffers network overhead plus query preparation plus results fetch plus spool. A hash file can preload to memory making every direct read come from memory, whereas a table has only query cache to assist it. A hash file does not suffer from many instances of the same job hitting the reference lookup (in memory, and/or shared in memory is no burden) whereas many instances doing a database refence lookup will congest around the database.

Unless you're doing a real-time time of solution, for batch processing you should try to use hash lookups. You'll find out that the same table being used in multiple jobs for reference purposes could easily be put into a hash file that all jobs can share. You'll hit the database once to unload the table and after that all usage is local to the server. That should be the preference in your architecture, to remove outside influences (databases) from your transformation.
Post Reply