Page 1 of 1

Any other option to minimise look ups

Posted: Tue Jan 27, 2004 11:57 pm
by raj_cipher
Hi ,

I got the info that using more no. of look ups affects perfomance(Speed).
In my case,i am using odbc stage for look ups.The records that i look up is in millions.The no. of look ups is also around 15.Is there any way to minimise look ups ?

Posted: Wed Jan 28, 2004 12:05 am
by ray.wurlod
Pre-load the lookup tables to memory. You can't do this with ODBC; you will need to use hashed files.
Pre-allocate disk for the hashed files (use MINIMUM.MODULUS option, to get a value use the Hashed File Calculator on your DataStage CD).
Loading the hashed files won't take long (minutes). Plus you need to design some simple jobs (15, was it?) to accomplish same.
Memory access speed is AT LEAST 1000 TIMES faster than disk access, which is what you need for ODBC.
It's definitely worth doing.
All aspects of this have been covered in the past on this forum, using its search facility will help you to fill in the gaps of my brief overview.

Re: Any other option to minimise look ups

Posted: Wed Jan 28, 2004 1:20 am
by jseclen
Hi,

Another way is create a query that contain almost lookups that you need.

>For example:

select cod_cli, cod_cta, cod_tip from client_table

select cod_tip, des_tip from cat_tip

in your query join both tables:
select a.cod_cli, a.cod_cta, a.cod_tip, b.des_tip
from client_table a, cat_tip b
where a.cod_tip = b.cod_tip

with this query you don't need lookups in DataStage. :lol:

Re: Any other option to minimise look ups

Posted: Wed Jan 28, 2004 5:23 am
by ogmios
My 2 cents would be to try to get your lookups included in the input query (calculating the data in the database not via DataStage). I only use database lookups or hashfiles for very large dimensions that are in other databases as the input-query.

In 75% of the cases a properly tuned database will outrun DataStage anytime for these kind of lookups... try it with a testcase.

My experience is also that although hash-files are of course faster than ODBC lookups having a lot of big hash-files on your system tends to trash the memory slowing down the complete system (Solaris/Datastage 6.0.x/2Gb Memory).

Use the database... one if its main purposes is doing these kind of joins (left outer join the tables if you want to mimic lookups).

Posted: Wed Jan 28, 2004 8:20 am
by kcbland
If you are building a data warehouse in batch mode:

Typically, you have surrogate keys. Therefore, you have a common task of switching natural keys for their warehouse assigned surrogate. This requires source data to reference target tables for surrogate foreign key substitution.

Since this is done over and over and over and over in soooo many jobs, doing the same subsitution for the same dimension soooo many times during a single load cycle (production dimensions, time dimensions, customer dimensions, code dimensions, etc) are in a lot of tables. This repititous lookup is burdensome on a database.

This is why you can prepare hash lookups, using all of the functionality meant to eliminate the burden of repititous and shared lookup. I populate terabyte data warehouses using just plain old Server. We know what we're talking about. It's a balance of functionality and performance with maintenance.