Hi,
I hv to select records from various tables.
I can use SQL Join query or use look-up to acheive this.
The volume of the records is in millions.
The count of the tables i'm going to use will be around seven.
Which of the following is a better option ?
Using a join query or using ODBC look-ups.
which is better - join query or odbc lookup ?
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 90
- Joined: Mon Dec 08, 2003 4:48 am
- Location: Chennai
which is better - join query or odbc lookup ?
Think Ahead,
Raj.D
Raj.D
I don't think any DataStage question can be answered with "it's better to use ODBC lookups".
Your question should probably be comparing Joins with HASH lookups.
My general Rule of Thumb is to say - if it can be done in the database, do it in the database. I'm curious what others think, but if I can make Oracle do the work... I let it! Especially something like avoiding a lookup when a simple join in the sourcing query can take its place.
![Laughing :lol:](./images/smilies/icon_lol.gif)
My general Rule of Thumb is to say - if it can be done in the database, do it in the database. I'm curious what others think, but if I can make Oracle do the work... I let it! Especially something like avoiding a lookup when a simple join in the sourcing query can take its place.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
I think it's worth clarifying the reasons for the unanimity.
First, there's nothing technically wrong with using the ODBC stage to perform reference lookups. It will work. But it will work very slowly, leading probably to unacceptable throughput.
Pre-loading hashed files and performing the reference lookups therein is a much more performant solution than ODBC; it doesn't take long to load them with a simple SELECT from the source database.
However, you then have to process every row from the driver table.
The advantage of performing the join at source is that you already have exactly the rows and columns that you need and - here's the crux of the argument - no redundant rows for DataStage to process.
To summarise.
First, there's nothing technically wrong with using the ODBC stage to perform reference lookups. It will work. But it will work very slowly, leading probably to unacceptable throughput.
Pre-loading hashed files and performing the reference lookups therein is a much more performant solution than ODBC; it doesn't take long to load them with a simple SELECT from the source database.
However, you then have to process every row from the driver table.
The advantage of performing the join at source is that you already have exactly the rows and columns that you need and - here's the crux of the argument - no redundant rows for DataStage to process.
To summarise.
- When the tables to be joined are in the same database instance, perform the join at source.
When the tables are in different database instances, pre-load hashed files and perform reference lookups to them.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.