DSXchange

Posted: **Mon Jun 01, 2009 4:48 am**

Hi
I want to do a lookup with a table having 50 million records ,
with the source flat file having 45 millon records

First i have implimented it trough a haded file ,the hashed file created out of the table -
It was very slow

Now I have implimented a direct orcale stage lookup-It is based on two keys -the keys are B tree indexed

Here also the performance is very poor ,look up speed is 280 rows per sec

Please let me know you suggestions to handle the scenario

Posted: **Mon Jun 01, 2009 5:52 am**

What speed do you get if you do no lookups? Maybe the bottleneck is in the stream input rather than in the reference input.

Posted: **Mon Jun 01, 2009 6:37 am**

Thanks...

With out lookup it is processing 3000 to 4000 records per sec

Posted: **Mon Jun 01, 2009 4:13 pm**

Is the hashed file created/populated in the same job or in a different job? If it's created in the same job then rows/sec is meaningless because it includes the wait time while the hashed file is populated. Populate the hashed file in a separate job and be prepared to be amazed.

Posted: **Mon Jun 01, 2009 11:29 pm**

The job logic is as given below

Case 1:

Job1-Create hashed file from table of 60 million records (fetching around 45 million records based on condition )
Job2-Using the hashed file created from job for look up purpose

Job1 is taking long time to create hashed file,running 400 rows/sec

So I tried the below option

with out hashed file,Do a direct look up on indexed columns in the table..

Please suggest...

Posted: **Tue Jun 02, 2009 12:15 am**

Here are the three main ideas for getting the best out of write performance with hashed files.

Critically examine what rows and columns are loaded into the hashed file, and eliminate any that are not required.

Pre-size the hashed file using MINIMUM.MODULUS setting (get an approximate value from the current size of the hashed file, divided by the group size (2KB)).

If the hashed file is under 999MB consider maximizing your write cache buffer size and employing cached write.

DSXchange

Look up is vey slow

Look up is vey slow