Hi Gurus,
I need to lookup on huge data i.e around 7 million records and the performance is very very slow.
I m making lookup directly on table using ODBC stage (Since the data is huge , i can't use hashed file).I have only one key column in the table where i m making lookup and That key column is VarChar data type and it is 20 characters length (Alpha numeric).
I guess this performance problem is because of the key column is a string of 20 alphanumeric characters.
Is there any alternative way , where i can increase the performance even i m making lookup on key which is of varchar .
Also i have tried using CRC32 function by converting the sting into unique number, even though it is giving poor performance.
Pl help me out if is there any ways....
Thanks in Advance
Cheers
OBDC Lookup Performance
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Of course you can use a hashed file. Seven million is not so many rows. You may even be able to fit it in a standard hashed file (up to 2GB of data). Be selective about what columns you load into the hashed file and what rows you load into the hashed file - only load the ones you need.
If you still have "too much" data, make your hashed file 64-bit addressed.
If you have more than 19 million TB (the maximum capacity of a 64-bit hashed file) in your 7 million rows, then the inability to use a hashed file is the least of your problems.
If you still have "too much" data, make your hashed file 64-bit addressed.
If you have more than 19 million TB (the maximum capacity of a 64-bit hashed file) in your 7 million rows, then the inability to use a hashed file is the least of your problems.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Re: OBDC Lookup Performance
No, it's all in the efficiency of your query. Compounded by the fact that it fires once for each record being processed. As noted, 7M records is NOT huge and should be easily handled by a hashed file - unless your records are extreeeemely... "wide".SharingDS wrote:I guess this performance problem is because of the key column is a string of 20 alphanumeric characters.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
7 million rows is barely worth processing. 7 billion rows is more like a healthy start.
Server Edition is exceptionally suitable for all data volumes, given the right ETL design and hardware.
Some easy steps: only put into hashed files what you need, when you need it. Don't keep persistent hashed files (a copy of your target tables in hashed file form. Use multiple job instances with partitioning of the data to create multiple transformation pipelines. Use bulk loading for inserts, and contrive a bulk updating process.
![Laughing :lol:](./images/smilies/icon_lol.gif)
Some easy steps: only put into hashed files what you need, when you need it. Don't keep persistent hashed files (a copy of your target tables in hashed file form. Use multiple job instances with partitioning of the data to create multiple transformation pipelines. Use bulk loading for inserts, and contrive a bulk updating process.
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle