Matching question
Posted: Mon Oct 17, 2011 3:13 pm
Dear all
I am trying to use qualitystage to find matches within our customer master
The challenge I have is I do not have a whole lot of input to go off. All I have is a file with two columns customer name and state
For this post please assume data is specific to US only.
I have not used standardization at all... just taking raw data from the file which has the customer name and state and trying to find a match.
I narrow the blocking criteria I am taking the first 4 characters of the name from each source
So my blocking is COUNTRY, STATE and first four characters
In my example I pass one record
HARVARD UNIV, MA
It does get decent matches but will not match HARVARD BIOLOGY
Even if I want oto keep all matche scores at zero
I realize this is a not a great example but I want to understand why it would not pick up a match on HARVARD BIOLOGY
I am trying to use qualitystage to find matches within our customer master
The challenge I have is I do not have a whole lot of input to go off. All I have is a file with two columns customer name and state
For this post please assume data is specific to US only.
I have not used standardization at all... just taking raw data from the file which has the customer name and state and trying to find a match.
I narrow the blocking criteria I am taking the first 4 characters of the name from each source
So my blocking is COUNTRY, STATE and first four characters
In my example I pass one record
HARVARD UNIV, MA
It does get decent matches but will not match HARVARD BIOLOGY
Even if I want oto keep all matche scores at zero
I realize this is a not a great example but I want to understand why it would not pick up a match on HARVARD BIOLOGY