Page 1 of 1

Reference Match Performance

Posted: Wed Nov 06, 2013 8:06 am
by kevink
We have a reference match on standardized address and area data. The reference data set has 25 million rows, and loads into the job at only 1700 rows per second. There are only 12,000 rows at a time in the source data set.

Can the experts please share ways to improve the performance of a reference match job? We would like to be able to run this match hourly but it currently runs 3.5 hours. :(

Posted: Wed Nov 06, 2013 9:37 am
by ray.wurlod
If the source and reference data have a common key, then you can build a temporary table of the source data keys, and extract the reference data from that joined to the actual reference data, thereby processing only the reference records that are actually needed.

Posted: Wed Nov 06, 2013 9:21 pm
by kevink
Unfortunately there is no common key in the two data sets. Is there any other strategy we might be able to use?