Page 1 of 1

Problem with name and address match

Posted: Wed Dec 23, 2009 7:40 am
by jeesim
Hi,

I am using the unduplicate stage to do the name(first, middle and last ) and address matching. I am getting 99% of the matches as per my requirement .

I have a problem when people who are residing in the same address and thier first name starts with the same initail. In this case quality stage considers them a match. In case we have only First initial of the one person then other persons name starts with the same initail then we like it to considered as match.

Example :
Jean Doe 123 Main Street , Warren, NJ, 09088
James Doe 123 Main Street , Warren, NJ, 09088

Quality stage considers them as match.

J Doe 123 Main Street , Warren, NJ, 09088
John Doe 123 Main Street , Warren, NJ, 09088

Quality stage considers them as match and we are fine with this result.

The column that is used in blocking is
CitynameNYSIIS_USAREA
ZIPCODE_USAREA
StreetName_NYSIIS_USAADDR
MatchPrimarywordNYSIIS_USNAME
MatchFirstnameNYSISS_USNAME
HouseNumber_USADDR



The coulmn that is used in Matching

HouseNumber_USADDR
StreetPrefixDirectional_USADDR
StreetPrefixtype_USADDR
StreetName_USADDR
StreetSuffixDirectional_USADDR
StreetSuffixtype_USADDR
UnitType_USADDR
UnitValue_USADDR
ZipCode_USAREA
Zip4AddonCode_USAREA
MatchFirstName_USNAME
MatchPrimaryName_USNAME
NameGeneration_USNAME

Thanks
jeesim

Posted: Wed Dec 23, 2009 3:59 pm
by ray.wurlod
Create another field containing the initial and match on that as CHAR.

Posted: Wed Dec 23, 2009 4:43 pm
by stuartjvnorton
What is the cutoff?
If you haven't set one, it will default to 0 and as long as the addresses match then there's probably more than enough to be considered a match, regardless of who lives there.

Posted: Thu Dec 24, 2009 7:44 am
by jeesim
The Cutoff is set to 58. The wieghtage value in most of these matches are about 68.
We need to consider the first name , as we are consolidating at customer level , not at household level.

Has anyone come across this problem?

Thanks
Jeesim

Posted: Thu Dec 24, 2009 3:28 pm
by ray.wurlod
How is the pattern I+ being handled by your rule set?

Posted: Mon Dec 28, 2009 11:52 am
by jeesim
The I+ is handled as First Name - I , Last name - +.

My problem is not with First initial. My issue is the way Quality Stage handles the Names starting with the same Initials.
Quality stage considers James ,Jerry, Jane, etc as a same individual as they have same address. They are different customers or patients in a business scenario.

Posted: Mon Dec 28, 2009 3:34 pm
by ray.wurlod
That is only true if you match on them. If you match on first initial, every name with the same first initial will be "the same".

Posted: Mon Dec 28, 2009 4:38 pm
by stuartjvnorton
What score do you get when you match a record against itself?
Maybe your cutoff needs to be a little higher and a you have a separate field to use in a separate pass for matching initials to names.

What scores do you get for the individual field when you match Jerry and Jacob?
Do you see the disagreement score and it's still high enough to get past the cutoff, or do you actually see the agreement score? What type of match are you using for the comparison for that field? What is your Comparison Threshold set at?

Posted: Mon Jan 04, 2010 11:37 am
by dsqspro
Create three char of first name and put heavy negative wt if three letters don't match,