which is better - join query or odbc lookup ?

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
raj_cipher
Participant
Posts: 90
Joined: Mon Dec 08, 2003 4:48 am
Location: Chennai

which is better - join query or odbc lookup ?

Post by raj_cipher »

Hi,

I hv to select records from various tables.
I can use SQL Join query or use look-up to acheive this.
The volume of the records is in millions.
The count of the tables i'm going to use will be around seven.

Which of the following is a better option ?

Using a join query or using ODBC look-ups.
Think Ahead,
Raj.D
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I don't think any DataStage question can be answered with "it's better to use ODBC lookups". :lol: Your question should probably be comparing Joins with HASH lookups.

My general Rule of Thumb is to say - if it can be done in the database, do it in the database. I'm curious what others think, but if I can make Oracle do the work... I let it! Especially something like avoiding a lookup when a simple join in the sourcing query can take its place.
-craig

"You can never have too many knives" -- Logan Nine Fingers
essaion
Participant
Posts: 18
Joined: Tue Nov 04, 2003 8:55 am
Contact:

Post by essaion »

I totally agree with Craig (for what is worth :)
Aurelien
nsm
Premium Member
Premium Member
Posts: 139
Joined: Mon Feb 09, 2004 8:58 am

Post by nsm »

Its better to go for Join Query..

nsm.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I think it's worth clarifying the reasons for the unanimity.

First, there's nothing technically wrong with using the ODBC stage to perform reference lookups. It will work. But it will work very slowly, leading probably to unacceptable throughput.

Pre-loading hashed files and performing the reference lookups therein is a much more performant solution than ODBC; it doesn't take long to load them with a simple SELECT from the source database.

However, you then have to process every row from the driver table.

The advantage of performing the join at source is that you already have exactly the rows and columns that you need and - here's the crux of the argument - no redundant rows for DataStage to process.

To summarise.
  • When the tables to be joined are in the same database instance, perform the join at source.
    When the tables are in different database instances, pre-load hashed files and perform reference lookups to them.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply