stuartjvnorton wrote:A couple of things come to mind:
- Just a name isn't a lot to reliably match people on. I'd try to see if you could find a little more to work with.
- Initials to block on is way too loose, and both soundex and NYSIIS of first name are overkill. Soundex makes checking an initial redundant. You might want to try something like NYSIIS of first name and NYSIIS of last name to allow some fuzziness, but covers multiple fronts.
- You have standardized the name: try something like a name_uncert or uncert on match names to help pick up nicknames. And 700 is very loose. Try 800 or 850 first, especially on earlier passes.
- Kind of pointless having exactly the same block for both passes. It's hard if all you really do have is the name, but try different options.
- Even if all you do have is the name, the more fields you can use to create a match score, the better. It makes thresholds a bit harder to tweak if you only have 1 field to get a score from. Use middle names and generationals.
Hope this helps.
Your suggestions were indeed helpful.
At first I only use initials to block, but the blocks were too big. So I decided to combine initials, soundex of first name, and NYSIIS of first name. Yet it never crossed my mind to use the soundex or NYSIIS of last name and combining the with the soundex or NYSIIS of first name. Though I'm curious, is there a specific reason on why you suggested the combination of NYSIIS instead of soundex?
I'll give all your suggestions a try. At least now I have other options to tweak my match specifications. Thanks for the help.
Perfection is not about making no mistakes. Perfection is about fixing your mistakes.