Match Frequency stage usage

Infosphere's Quality Product

Moderators: chulett, rschirm

Post Reply
verify
Premium Member
Premium Member
Posts: 99
Joined: Sun Mar 30, 2008 8:35 am

Match Frequency stage usage

Post by verify »

How does the output of the Match Frequency stage aid in the Matching process. I understand that in undup match we use both the match Frequency stage output and standardised data as inputs to the unduplicate stage. How do we comprehend the output columns of Match Frequency stage.

Thanks.
RK Raju
vairus
Participant
Posts: 52
Joined: Thu Feb 07, 2008 8:02 am
Location: Johannesburg

Re: Match Frequency stage usage

Post by vairus »

Match Frequency stage generates frequency data that tells you how often a particular value appears in a particular column.

example:

In the name column "john" appears more than 50 times and another value "johan" appears 2 times.if you matching on name and address. In matching process more weight will be given when "johan" is matched to "johan" than the "john" matched to "john".

match frequency output contain value,count and other statistical info contain 0000 0011 ...(Don't know how it generating this & meaning )it is statistical value generated by the application .which is used by various matching algorithm.

these info are not documented.Because as a user we are not using these stats...

Statistical info from the Match frequency makes the QS matching process more precise.


If you have a statistical background, the theory of record linkage as implemented with WebSphere QualityStage is explained in depth in the following papers

Fellegi, I.P. and Sunter, A. B. (1969) ″A Theory for Record Linkage,″ Journal of the American Statistical Association, 64, 1183-1210.

Jaro, M. A. (1989) ″Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida″, Journal of the American Statistical Association, 84, No. 406, 414-420

Regards
Vairamuthu
vairamuthu
Post Reply