Page 1 of 1

Blocking and matching

Posted: Mon Dec 21, 2009 7:33 pm
by kkumardatastage
Hi

Please Help me in this issue.

I am using a column First Name as blocking in pass 1, and i would like to use the same column First Name for matching in the same pass 1, is it good way to do the Matching and Blocking for the same column.

Thanks
k

Posted: Mon Dec 21, 2009 9:41 pm
by ray.wurlod
Use a phonetic equivalent (e.g. NYSIIS of FirstName) as the blocking column. This will allow for slight misspellings when you match on FirstName.

Depending on your data volume and value frequencies, FirstName alone may cause block sizes to be larger than one might like - you may benefit from investigating other blocking and matching fields in this pass.

Posted: Tue Dec 22, 2009 8:17 am
by stuartjvnorton
As Ray says, it's better to use a NYSIIS or Soundex of the name to block on (as one of a number of blocking fields), before matching on the actual name.
Remember, the idea of a blocking field is to get rid of the ones that definitely don't match and do it quickly.

So you'd never use the exact same field for both blocking and matching. If you did, then you'd only ever be scoring exact matches for that field (with minor differences for frequency distribution), making it a waste of time.