performance issue

Infosphere's Quality Product

Moderators: chulett, rschirm

Post Reply
umamahes
Premium Member
Premium Member
Posts: 110
Joined: Tue Jul 04, 2006 9:08 pm

performance issue

Post by umamahes »

Hi,
I have 60 million recoeds in one table and i am getting 10,000 records in other file and i am matching the records in these two tables.How is the perofrmance of the quality stage.How much time does it take to process.Is there any issues with the processing times and performance.

Thanks
HI
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

How long is a piece of string?

That's an unfair question, it depends on so many things, such as your hardware (especially CPU speed and amount of memory), parallelism, and what kind of operation you are performing. For example, a two file match may need to compare every row in one data set with every row in the other (a Cartesian product in effect) - in your case 10000 x 60000000 comparisons.

How fast could YOU do that?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
boxtoby
Premium Member
Premium Member
Posts: 138
Joined: Mon Mar 13, 2006 5:11 pm
Location: UK

Post by boxtoby »

Might be a bit late with this reply, but hey!

Ray's reply is correct, but there things that you can do which will help.

First of all, the more passes you have the longer it will take, fairly obviously.

Secondly, check the block sizes on the match report. If you are getting large block sizes this WILL slow down the match considerably. Before you ask, I'm not sure what a 'large block size' is! Probably over 100, may be less? Check the progess in the logs, some give a warning of large block sizes.

Thirdly, use the buffer options on the 'Advanced' tab to increase memory avaliable to the match.

Finally, depending on your post match requirements, consider only outputting keys from the match and then join to the main data in your database. This will save space.

Hope this helps!

Bob Oxtoby
Bob Oxtoby
rohank
Participant
Posts: 2
Joined: Tue Feb 24, 2004 4:49 am

Fuzzy Matching performance

Post by rohank »

Hi,
Similar to this question, i have a query on performance of matching records in QS.

We have two flat files, one containing 3 million records and other one around 10,000 records. We are trying to do a fuzzy match on two of the fields. When I say fuzzy match, it means simalr sounding matching based on the rules set defined in QS.
This process is taking a period of around 4-5 hours to complete on a daily basis.
Is there any way by which this process can be improved?

Thanks,
Rohan
Post Reply