Blocking and matching

Infosphere's Quality Product

Moderators: chulett, rschirm

Post Reply
kkumardatastage
Participant
Posts: 84
Joined: Sat Jul 19, 2008 8:50 am

Blocking and matching

Post by kkumardatastage »

Hi

Please Help me in this issue.

I am using a column First Name as blocking in pass 1, and i would like to use the same column First Name for matching in the same pass 1, is it good way to do the Matching and Blocking for the same column.

Thanks
k
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Use a phonetic equivalent (e.g. NYSIIS of FirstName) as the blocking column. This will allow for slight misspellings when you match on FirstName.

Depending on your data volume and value frequencies, FirstName alone may cause block sizes to be larger than one might like - you may benefit from investigating other blocking and matching fields in this pass.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
stuartjvnorton
Participant
Posts: 527
Joined: Thu Apr 19, 2007 1:25 am
Location: Melbourne

Post by stuartjvnorton »

As Ray says, it's better to use a NYSIIS or Soundex of the name to block on (as one of a number of blocking fields), before matching on the actual name.
Remember, the idea of a blocking field is to get rid of the ones that definitely don't match and do it quickly.

So you'd never use the exact same field for both blocking and matching. If you did, then you'd only ever be scoring exact matches for that field (with minor differences for frequency distribution), making it a waste of time.
Post Reply