Page 1 of 1

performance issue

Posted: Sat Jan 13, 2007 5:21 pm
by umamahes
Hi,
I have 60 million recoeds in one table and i am getting 10,000 records in other file and i am matching the records in these two tables.How is the perofrmance of the quality stage.How much time does it take to process.Is there any issues with the processing times and performance.

Thanks

Posted: Sat Jan 13, 2007 6:22 pm
by ray.wurlod
How long is a piece of string?

That's an unfair question, it depends on so many things, such as your hardware (especially CPU speed and amount of memory), parallelism, and what kind of operation you are performing. For example, a two file match may need to compare every row in one data set with every row in the other (a Cartesian product in effect) - in your case 10000 x 60000000 comparisons.

How fast could YOU do that?

Posted: Wed Jan 31, 2007 10:51 am
by boxtoby
Might be a bit late with this reply, but hey!

Ray's reply is correct, but there things that you can do which will help.

First of all, the more passes you have the longer it will take, fairly obviously.

Secondly, check the block sizes on the match report. If you are getting large block sizes this WILL slow down the match considerably. Before you ask, I'm not sure what a 'large block size' is! Probably over 100, may be less? Check the progess in the logs, some give a warning of large block sizes.

Thirdly, use the buffer options on the 'Advanced' tab to increase memory avaliable to the match.

Finally, depending on your post match requirements, consider only outputting keys from the match and then join to the main data in your database. This will save space.

Hope this helps!

Bob Oxtoby

Fuzzy Matching performance

Posted: Wed Feb 07, 2007 1:12 am
by rohank
Hi,
Similar to this question, i have a query on performance of matching records in QS.

We have two flat files, one containing 3 million records and other one around 10,000 records. We are trying to do a fuzzy match on two of the fields. When I say fuzzy match, it means simalr sounding matching based on the rules set defined in QS.
This process is taking a period of around 4-5 hours to complete on a daily basis.
Is there any way by which this process can be improved?

Thanks,
Rohan