Hi ,
I got the info that using more no. of look ups affects perfomance(Speed).
In my case,i am using odbc stage for look ups.The records that i look up is in millions.The no. of look ups is also around 15.Is there any way to minimise look ups ?
Any other option to minimise look ups
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 90
- Joined: Mon Dec 08, 2003 4:48 am
- Location: Chennai
Any other option to minimise look ups
Think Ahead,
Raj.D
Raj.D
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Pre-load the lookup tables to memory. You can't do this with ODBC; you will need to use hashed files.
Pre-allocate disk for the hashed files (use MINIMUM.MODULUS option, to get a value use the Hashed File Calculator on your DataStage CD).
Loading the hashed files won't take long (minutes). Plus you need to design some simple jobs (15, was it?) to accomplish same.
Memory access speed is AT LEAST 1000 TIMES faster than disk access, which is what you need for ODBC.
It's definitely worth doing.
All aspects of this have been covered in the past on this forum, using its search facility will help you to fill in the gaps of my brief overview.
Pre-allocate disk for the hashed files (use MINIMUM.MODULUS option, to get a value use the Hashed File Calculator on your DataStage CD).
Loading the hashed files won't take long (minutes). Plus you need to design some simple jobs (15, was it?) to accomplish same.
Memory access speed is AT LEAST 1000 TIMES faster than disk access, which is what you need for ODBC.
It's definitely worth doing.
All aspects of this have been covered in the past on this forum, using its search facility will help you to fill in the gaps of my brief overview.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 133
- Joined: Wed Mar 05, 2003 4:19 pm
- Location: Lima - Peru. Sudamerica
- Contact:
Re: Any other option to minimise look ups
Hi,
Another way is create a query that contain almost lookups that you need.
>For example:
select cod_cli, cod_cta, cod_tip from client_table
select cod_tip, des_tip from cat_tip
in your query join both tables:
select a.cod_cli, a.cod_cta, a.cod_tip, b.des_tip
from client_table a, cat_tip b
where a.cod_tip = b.cod_tip
with this query you don't need lookups in DataStage.
Another way is create a query that contain almost lookups that you need.
>For example:
select cod_cli, cod_cta, cod_tip from client_table
select cod_tip, des_tip from cat_tip
in your query join both tables:
select a.cod_cli, a.cod_cta, a.cod_tip, b.des_tip
from client_table a, cat_tip b
where a.cod_tip = b.cod_tip
with this query you don't need lookups in DataStage.
Saludos,
Miguel Seclén
Lima - Peru
Miguel Seclén
Lima - Peru
Re: Any other option to minimise look ups
My 2 cents would be to try to get your lookups included in the input query (calculating the data in the database not via DataStage). I only use database lookups or hashfiles for very large dimensions that are in other databases as the input-query.
In 75% of the cases a properly tuned database will outrun DataStage anytime for these kind of lookups... try it with a testcase.
My experience is also that although hash-files are of course faster than ODBC lookups having a lot of big hash-files on your system tends to trash the memory slowing down the complete system (Solaris/Datastage 6.0.x/2Gb Memory).
Use the database... one if its main purposes is doing these kind of joins (left outer join the tables if you want to mimic lookups).
In 75% of the cases a properly tuned database will outrun DataStage anytime for these kind of lookups... try it with a testcase.
My experience is also that although hash-files are of course faster than ODBC lookups having a lot of big hash-files on your system tends to trash the memory slowing down the complete system (Solaris/Datastage 6.0.x/2Gb Memory).
Use the database... one if its main purposes is doing these kind of joins (left outer join the tables if you want to mimic lookups).
If you are building a data warehouse in batch mode:
Typically, you have surrogate keys. Therefore, you have a common task of switching natural keys for their warehouse assigned surrogate. This requires source data to reference target tables for surrogate foreign key substitution.
Since this is done over and over and over and over in soooo many jobs, doing the same subsitution for the same dimension soooo many times during a single load cycle (production dimensions, time dimensions, customer dimensions, code dimensions, etc) are in a lot of tables. This repititous lookup is burdensome on a database.
This is why you can prepare hash lookups, using all of the functionality meant to eliminate the burden of repititous and shared lookup. I populate terabyte data warehouses using just plain old Server. We know what we're talking about. It's a balance of functionality and performance with maintenance.
Typically, you have surrogate keys. Therefore, you have a common task of switching natural keys for their warehouse assigned surrogate. This requires source data to reference target tables for surrogate foreign key substitution.
Since this is done over and over and over and over in soooo many jobs, doing the same subsitution for the same dimension soooo many times during a single load cycle (production dimensions, time dimensions, customer dimensions, code dimensions, etc) are in a lot of tables. This repititous lookup is burdensome on a database.
This is why you can prepare hash lookups, using all of the functionality meant to eliminate the burden of repititous and shared lookup. I populate terabyte data warehouses using just plain old Server. We know what we're talking about. It's a balance of functionality and performance with maintenance.
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle