Match Frequency stage usage

Posted: Wed Sep 10, 2008 10:50 pm
by verify
How does the output of the Match Frequency stage aid in the Matching process. I understand that in undup match we use both the match Frequency stage output and standardised data as inputs to the unduplicate stage. How do we comprehend the output columns of Match Frequency stage.


Re: Match Frequency stage usage

Posted: Thu Sep 11, 2008 12:59 am
by vairus
Match Frequency stage generates frequency data that tells you how often a particular value appears in a particular column.


In the name column "john" appears more than 50 times and another value "johan" appears 2 times.if you matching on name and address. In matching process more weight will be given when "johan" is matched to "johan" than the "john" matched to "john".

match frequency output contain value,count and other statistical info contain 0000 0011 ...(Don't know how it generating this & meaning )it is statistical value generated by the application .which is used by various matching algorithm.

these info are not documented.Because as a user we are not using these stats...

Statistical info from the Match frequency makes the QS matching process more precise.

If you have a statistical background, the theory of record linkage as implemented with WebSphere QualityStage is explained in depth in the following papers

Fellegi, I.P. and Sunter, A. B. (1969) ″A Theory for Record Linkage,″ Journal of the American Statistical Association, 64, 1183-1210.

Jaro, M. A. (1989) ″Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida″, Journal of the American Statistical Association, 84, No. 406, 414-420
