Reference Match Performance

kevink · Post by **kevink** » Wed Nov 06, 2013 8:06 am

We have a reference match on standardized address and area data. The reference data set has 25 million rows, and loads into the job at only 1700 rows per second. There are only 12,000 rows at a time in the source data set.

Can the experts please share ways to improve the performance of a reference match job? We would like to be able to run this match hourly but it currently runs 3.5 hours.

ray.wurlod · Post by **ray.wurlod** » Wed Nov 06, 2013 9:37 am

If the source and reference data have a common key, then you can build a temporary table of the source data keys, and extract the reference data from that joined to the actual reference data, thereby processing only the reference records that are actually needed.

kevink · Post by **kevink** » Wed Nov 06, 2013 9:21 pm

Unfortunately there is no common key in the two data sets. Is there any other strategy we might be able to use?