Matching Names

Infosphere's Quality Product

Moderators: chulett, rschirm

Post Reply
asyafrudin
Participant
Posts: 16
Joined: Thu Oct 21, 2010 1:40 am
Location: Indonesia
Contact:

Matching Names

Post by asyafrudin »

Hi.

I'm a newbie here. I haven't used IBM QualityStage much to date, but I have something to discuss regarding a matching process that I've done a few days ago. I hope I'm posting this in the right place.

I'll get to the point. I was assigned to match an external data into a master data. Within this external data, I can only see NAME as the only column I could use to match the external data with the master data.

Here's the match specification that I used for the matching job:

Code: Select all

Pass 1:
Blocking: Name initials, Soundex of first name, and NYSIIS of first name.
Matching: Name v Name using UNCERT; m=0.9; u=0.01; Param1=700

Pass 2:
Blocking: Same as Pass 1
Matching: Name v Name using MULT_UNCERT; m=0.8; u=0.05; Param1=700
I've tried other specifications I could think about but I think the above is the best one I could think of to date. Does anyone here have a better solution?

Thank you.
Perfection is not about making no mistakes. Perfection is about fixing your mistakes.
stuartjvnorton
Participant
Posts: 527
Joined: Thu Apr 19, 2007 1:25 am
Location: Melbourne

Post by stuartjvnorton »

A couple of things come to mind:
- Just a name isn't a lot to reliably match people on. I'd try to see if you could find a little more to work with.
- Initials to block on is way too loose, and both soundex and NYSIIS of first name are overkill. Soundex makes checking an initial redundant. You might want to try something like NYSIIS of first name and NYSIIS of last name to allow some fuzziness, but covers multiple fronts.
- You have standardized the name: try something like a name_uncert or uncert on match names to help pick up nicknames. And 700 is very loose. Try 800 or 850 first, especially on earlier passes.
- Kind of pointless having exactly the same block for both passes. It's hard if all you really do have is the name, but try different options.
- Even if all you do have is the name, the more fields you can use to create a match score, the better. It makes thresholds a bit harder to tweak if you only have 1 field to get a score from. Use middle names and generationals.

Hope this helps.
asyafrudin
Participant
Posts: 16
Joined: Thu Oct 21, 2010 1:40 am
Location: Indonesia
Contact:

Post by asyafrudin »

stuartjvnorton wrote:A couple of things come to mind:
- Just a name isn't a lot to reliably match people on. I'd try to see if you could find a little more to work with.
- Initials to block on is way too loose, and both soundex and NYSIIS of first name are overkill. Soundex makes checking an initial redundant. You might want to try something like NYSIIS of first name and NYSIIS of last name to allow some fuzziness, but covers multiple fronts.
- You have standardized the name: try something like a name_uncert or uncert on match names to help pick up nicknames. And 700 is very loose. Try 800 or 850 first, especially on earlier passes.
- Kind of pointless having exactly the same block for both passes. It's hard if all you really do have is the name, but try different options.
- Even if all you do have is the name, the more fields you can use to create a match score, the better. It makes thresholds a bit harder to tweak if you only have 1 field to get a score from. Use middle names and generationals.

Hope this helps.
Your suggestions were indeed helpful.

At first I only use initials to block, but the blocks were too big. So I decided to combine initials, soundex of first name, and NYSIIS of first name. Yet it never crossed my mind to use the soundex or NYSIIS of last name and combining the with the soundex or NYSIIS of first name. Though I'm curious, is there a specific reason on why you suggested the combination of NYSIIS instead of soundex?

I'll give all your suggestions a try. At least now I have other options to tweak my match specifications. Thanks for the help.
Perfection is not about making no mistakes. Perfection is about fixing your mistakes.
stuartjvnorton
Participant
Posts: 527
Joined: Thu Apr 19, 2007 1:25 am
Location: Melbourne

Post by stuartjvnorton »

Soundex is for the most part looser (which is sometimes just what you need), but at the same time insists on the first letter being correct.
Cathy and Kathy: block on soundex would fail, but would pass using NYSIIS.
They each have their uses though.
Post Reply