OBDC Lookup Performance

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
SharingDS
Participant
Posts: 12
Joined: Fri Apr 28, 2006 8:58 pm

OBDC Lookup Performance

Post by SharingDS »

Hi Gurus,

I need to lookup on huge data i.e around 7 million records and the performance is very very slow.
I m making lookup directly on table using ODBC stage (Since the data is huge , i can't use hashed file).I have only one key column in the table where i m making lookup and That key column is VarChar data type and it is 20 characters length (Alpha numeric).

I guess this performance problem is because of the key column is a string of 20 alphanumeric characters.
Is there any alternative way , where i can increase the performance even i m making lookup on key which is of varchar .

Also i have tried using CRC32 function by converting the sting into unique number, even though it is giving poor performance.

Pl help me out if is there any ways....

Thanks in Advance

Cheers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Of course you can use a hashed file. Seven million is not so many rows. You may even be able to fit it in a standard hashed file (up to 2GB of data). Be selective about what columns you load into the hashed file and what rows you load into the hashed file - only load the ones you need.

If you still have "too much" data, make your hashed file 64-bit addressed.

If you have more than 19 million TB (the maximum capacity of a 64-bit hashed file) in your 7 million rows, then the inability to use a hashed file is the least of your problems.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Re: OBDC Lookup Performance

Post by chulett »

SharingDS wrote:I guess this performance problem is because of the key column is a string of 20 alphanumeric characters.
No, it's all in the efficiency of your query. Compounded by the fact that it fires once for each record being processed. As noted, 7M records is NOT huge and should be easily handled by a hashed file - unless your records are extreeeemely... "wide".
-craig

"You can never have too many knives" -- Logan Nine Fingers
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

7 million rows is barely worth processing. 7 billion rows is more like a healthy start. :lol: Server Edition is exceptionally suitable for all data volumes, given the right ETL design and hardware.

Some easy steps: only put into hashed files what you need, when you need it. Don't keep persistent hashed files (a copy of your target tables in hashed file form. Use multiple job instances with partitioning of the data to create multiple transformation pipelines. Use bulk loading for inserts, and contrive a bulk updating process.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Post Reply