OBDC Lookup Performance

SharingDS · Post by **SharingDS** » Sat Apr 21, 2007 12:20 am

Hi Gurus,

I need to lookup on huge data i.e around 7 million records and the performance is very very slow.
I m making lookup directly on table using ODBC stage (Since the data is huge , i can't use hashed file).I have only one key column in the table where i m making lookup and That key column is VarChar data type and it is 20 characters length (Alpha numeric).

I guess this performance problem is because of the key column is a string of 20 alphanumeric characters.
Is there any alternative way , where i can increase the performance even i m making lookup on key which is of varchar .

Also i have tried using CRC32 function by converting the sting into unique number, even though it is giving poor performance.

Pl help me out if is there any ways....

Thanks in Advance

Cheers

ray.wurlod · Post by **ray.wurlod** » Sat Apr 21, 2007 2:25 am

Of course you can use a hashed file. Seven million is not so many rows. You may even be able to fit it in a standard hashed file (up to 2GB of data). Be selective about what columns you load into the hashed file and what rows you load into the hashed file - only load the ones you need.

If you still have "too much" data, make your hashed file 64-bit addressed.

If you have more than 19 million TB (the maximum capacity of a 64-bit hashed file) in your 7 million rows, then the inability to use a hashed file is the least of your problems.

chulett · Post by **chulett** » Sat Apr 21, 2007 6:34 am

SharingDS wrote:I guess this performance problem is because of the key column is a string of 20 alphanumeric characters.

No, it's all in the efficiency of your query. Compounded by the fact that it fires once for each record being processed. As noted, 7M records is NOT huge and should be easily handled by a hashed file - unless your records are extreeeemely... "wide".

kcbland · Post by **kcbland** » Sat Apr 21, 2007 10:17 am

7 million rows is barely worth processing. 7 billion rows is more like a healthy start.

Server Edition is exceptionally suitable for all data volumes, given the right ETL design and hardware.

Some easy steps: only put into hashed files what you need, when you need it. Don't keep persistent hashed files (a copy of your target tables in hashed file form. Use multiple job instances with partitioning of the data to create multiple transformation pipelines. Use bulk loading for inserts, and contrive a bulk updating process.

DSXchange

OBDC Lookup Performance

OBDC Lookup Performance

Re: OBDC Lookup Performance