I want to find the uncertain fuzzy matches like for example
Will Smith or Wil Smith or Wills Smith or Wil Smit as matches.In match specification I am blocking the field Full Name and under match commands I am matching on field Full Name with match comparision type command as "CHAR comparisions" and m-prob as 0.8 and u-prob as 0.3.
Do I need to go with this comparision or is it better to go with "UNICERT character uncertainity comparision " or "NAME UNCERT comparision" with m-prob as 0.8,u-prob as 0.3 and param2 weight override as 800.Which one would you suggest to go with to capture fuzzy matches which has typo errors etc.
Uncertain record values match in match specification
-
- Premium Member
- Posts: 87
- Joined: Mon Feb 18, 2008 3:58 pm
- Location: Sacramento, CA
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
In the Match specification block on NYSIIS of name and NYSIIS of first name. Use UNCERT (assuming an undup match). Your u-prob figure (probability that match is purely by chance) is very high, usually it's set around 0.01 - is there a reason for your choice?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Premium Member
- Posts: 425
- Joined: Sat Nov 19, 2005 9:26 am
- Location: New York City
- Contact:
Brad,
In your case a soundex value from the standardization process will serve you better for matching criteria. That should add a common value for all your records. You don't want to add fields used as blocking criteria in the matching commands ... you already know that they common to both records
You would like to used full name in your matching commands as well as other results fields from the std process like PRIMARYNAME, MATCHPRIMARYNAMEPACKKEY, etc
Any uncert algorism for you is good, but I like the new MULTI_ALIGN better because cover all the UNCERT features plus allows you more control when the token are in different order
In your case a soundex value from the standardization process will serve you better for matching criteria. That should add a common value for all your records. You don't want to add fields used as blocking criteria in the matching commands ... you already know that they common to both records
You would like to used full name in your matching commands as well as other results fields from the std process like PRIMARYNAME, MATCHPRIMARYNAMEPACKKEY, etc
Any uncert algorism for you is good, but I like the new MULTI_ALIGN better because cover all the UNCERT features plus allows you more control when the token are in different order
Julio Rodriguez
ETL Developer by choice
"Sure we have lots of reasons for being rude - But no excuses
ETL Developer by choice
"Sure we have lots of reasons for being rude - But no excuses
-
- Premium Member
- Posts: 87
- Joined: Mon Feb 18, 2008 3:58 pm
- Location: Sacramento, CA
-
- Premium Member
- Posts: 87
- Joined: Mon Feb 18, 2008 3:58 pm
- Location: Sacramento, CA