Uncertain record values match in match specification

Infosphere's Quality Product

Moderators: chulett, rschirm

Post Reply
BradMiller
Premium Member
Premium Member
Posts: 87
Joined: Mon Feb 18, 2008 3:58 pm
Location: Sacramento, CA

Uncertain record values match in match specification

Post by BradMiller »

I want to find the uncertain fuzzy matches like for example
Will Smith or Wil Smith or Wills Smith or Wil Smit as matches.In match specification I am blocking the field Full Name and under match commands I am matching on field Full Name with match comparision type command as "CHAR comparisions" and m-prob as 0.8 and u-prob as 0.3.
Do I need to go with this comparision or is it better to go with "UNICERT character uncertainity comparision " or "NAME UNCERT comparision" with m-prob as 0.8,u-prob as 0.3 and param2 weight override as 800.Which one would you suggest to go with to capture fuzzy matches which has typo errors etc.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

In the Match specification block on NYSIIS of name and NYSIIS of first name. Use UNCERT (assuming an undup match). Your u-prob figure (probability that match is purely by chance) is very high, usually it's set around 0.01 - is there a reason for your choice?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
JRodriguez
Premium Member
Premium Member
Posts: 425
Joined: Sat Nov 19, 2005 9:26 am
Location: New York City
Contact:

Post by JRodriguez »

Brad,

In your case a soundex value from the standardization process will serve you better for matching criteria. That should add a common value for all your records. You don't want to add fields used as blocking criteria in the matching commands ... you already know that they common to both records


You would like to used full name in your matching commands as well as other results fields from the std process like PRIMARYNAME, MATCHPRIMARYNAMEPACKKEY, etc

Any uncert algorism for you is good, but I like the new MULTI_ALIGN better because cover all the UNCERT features plus allows you more control when the token are in different order
Julio Rodriguez
ETL Developer by choice

"Sure we have lots of reasons for being rude - But no excuses
BradMiller
Premium Member
Premium Member
Posts: 87
Joined: Mon Feb 18, 2008 3:58 pm
Location: Sacramento, CA

Post by BradMiller »

Thanks Ray & Julio.Its pretty clear now.I used the u prob as 0.3 because it would give me more matches when the name is written with many typ errors.I used to fetch more records as the probablility is more.
BradMiller
Premium Member
Premium Member
Posts: 87
Joined: Mon Feb 18, 2008 3:58 pm
Location: Sacramento, CA

Post by BradMiller »

Thanks Ray & Julio.Its pretty clear now.I used the u prob as 0.3 because it would give me more matches when the name is written with many typo errors.I used to fetch more records as the probablility is more.
Post Reply