How to retrieve multiple rows using lookup table

tkozlows · Post by **tkozlows** » Mon Mar 14, 2005 9:35 pm

I need to retrieve multiple rows from an Oracle master database using records stored in another (lookup) Oracle database. Essentially, each record in the lookup table will have multiple corresponding rows in the master table. All rows retrieved from the master table need to be then stored in another Oracle table.

I would greatly appreciate if anyone could point me in the right direction.

Thank you kindly
Thomas

manishsk · Post by **manishsk** » Mon Mar 14, 2005 9:48 pm

You can use oracle plugin as a lookup directly. Another way is I think you need to create a database link(from the master database) to the other database and use oracle plugin to access the table from master database. Datbase link is needed if the lookup table in in another database.

Hope that helps.

Regards,
Manish

tkozlows wrote:I need to retrieve multiple rows from an Oracle master database using records stored in another (lookup) Oracle database. Essentially, each record in the lookup table will have multiple corresponding rows in the master table. All rows retrieved from the master table need to be then stored in another Oracle table.

I would greatly appreciate if anyone could point me in the right direction.

Thank you kindly
Thomas

ray.wurlod · Post by **ray.wurlod** » Mon Mar 14, 2005 10:11 pm

The ODBC stage type is equipped for multi-row returns from lookups.
Invoke help, and search for Multi.

tkozlows · Post by **tkozlows** » Mon Mar 14, 2005 10:27 pm

Thank you, but I do not have an ODBC connectivity between DS and the Oracle servers.

chulett · Post by **chulett** » Mon Mar 14, 2005 10:37 pm

But you could if you really needed it. All you have to do is set it up, the drivers come with the product.

kcbland · Post by **kcbland** » Mon Mar 14, 2005 10:37 pm

Maybe we can help point you in a direction that better utilizies the features in DataStage. Can we get an idea of row counts in the source and reference tables? There are alternatives, such as high-performance hash files that provide reference capabilities, and maybe just approaching your design dilemma with a different perspective can change your solution.

Can you give specific table names and tell us what you need to achieve? Multi-row lookups are usually found in either slowly-changing dimension type lookups, such as ranged queries using a BETWEEN statement, or in situations where you can't explicitly identify the foreign key relationship. Otherwise, folks tend to think of the "master" as the driving table, instead of the "detail" as the driving table. Simply turning that around achieves what people want, plus enables all of the high-performance capabilities in DataStage.

tkozlows · Post by **tkozlows** » Mon Mar 14, 2005 10:46 pm

The master table holds approx 70M item records. My lookup table has about 1M product records. Using the product records in the lookup table, I expect to extract approx 3-4M item records from the master table. I hope this helps.

kcbland · Post by **kcbland** » Mon Mar 14, 2005 10:50 pm

The item master table has a product foreign key? Are you saying that you wish to select from the item master based on the product keys in the lookup table? Would pseudo-sql look something like:

Code: Select all

select * from item_master where product key in (select product_key from product_lookup)

If this is the case, your job design choice of ODBC or OCI lookups based on your row counts will be extremely slow. Are the two tables on separate instances, or co-located?

tkozlows · Post by **tkozlows** » Mon Mar 14, 2005 11:02 pm

That's right. The product_key in product_lookup is a subset of the product_key in item_master, hence there are multiple records in item_master for each record in product_lookup.

The two tables 'live' in separate database instance (and separate physical hosts).

kcbland · Post by **kcbland** » Mon Mar 14, 2005 11:45 pm

The problem with using an OCI/ODBC lookup is that it will take forever, basically all 70 million rows in the item master table will generate a SQL query against the product table. You do the math, that's 70 million queries. All 70 million rows have to exit the item master instance, traverse the network, enter a DS transformer, generate the query, wait for the response, and then check the result. Now do all of that 70 million times, only allowing the 3-4 million rows you're interested in pass thru.

Is there any chance you can create a work table within the item master instance, say in a user work shema, and pull the product keys you're interested in from the product lookup table and put there. Then, doing a constraining inner-join, only pull the rows you want? This is BY FAR the best solution.

If you cannot get a work table, your choices become very limited. You will be required to pull all 70 million rows from the item master across to the DataStage server, and throw away the ones you don't need. Instead of an OCI/ODBC lookup, simply build a hash lookup just containing the product keys you want, and constrain against that reference lookup. You'll find that a lot faster because you only hit the network and db once to pull the product keys from the lookup table into the hash file. You can then use hash file caching to accelerate performance.

You'll want to instantiate the job extracting the data from the item master table, using some sort of data partitioning scheme on the WHERE clause to create multiple connections to the database. Use multiple job instances to handle this, so that you get the 70 million rows out as fast as possible.

Good luck.

tkozlows · Post by **tkozlows** » Tue Mar 15, 2005 2:37 pm

Great. Thank you very much for your help. :D

DSXchange

How to retrieve multiple rows using lookup table

How to retrieve multiple rows using lookup table

Re: How to retrieve multiple rows using lookup table