Page 1 of 1

Uncertain record values match in match specification

Posted: Wed Nov 04, 2009 1:01 pm
by BradMiller
I want to find the uncertain fuzzy matches like for example
Will Smith or Wil Smith or Wills Smith or Wil Smit as matches.In match specification I am blocking the field Full Name and under match commands I am matching on field Full Name with match comparision type command as "CHAR comparisions" and m-prob as 0.8 and u-prob as 0.3.
Do I need to go with this comparision or is it better to go with "UNICERT character uncertainity comparision " or "NAME UNCERT comparision" with m-prob as 0.8,u-prob as 0.3 and param2 weight override as 800.Which one would you suggest to go with to capture fuzzy matches which has typo errors etc.

Posted: Wed Nov 04, 2009 2:38 pm
by ray.wurlod
In the Match specification block on NYSIIS of name and NYSIIS of first name. Use UNCERT (assuming an undup match). Your u-prob figure (probability that match is purely by chance) is very high, usually it's set around 0.01 - is there a reason for your choice?

Posted: Wed Nov 04, 2009 2:38 pm
by JRodriguez
Brad,

In your case a soundex value from the standardization process will serve you better for matching criteria. That should add a common value for all your records. You don't want to add fields used as blocking criteria in the matching commands ... you already know that they common to both records


You would like to used full name in your matching commands as well as other results fields from the std process like PRIMARYNAME, MATCHPRIMARYNAMEPACKKEY, etc

Any uncert algorism for you is good, but I like the new MULTI_ALIGN better because cover all the UNCERT features plus allows you more control when the token are in different order

Posted: Wed Nov 04, 2009 3:58 pm
by BradMiller
Thanks Ray & Julio.Its pretty clear now.I used the u prob as 0.3 because it would give me more matches when the name is written with many typ errors.I used to fetch more records as the probablility is more.

Posted: Wed Nov 04, 2009 3:59 pm
by BradMiller
Thanks Ray & Julio.Its pretty clear now.I used the u prob as 0.3 because it would give me more matches when the name is written with many typo errors.I used to fetch more records as the probablility is more.