Survivorship Stage Process

Infosphere's Quality Product

Moderators: chulett, rschirm

Post Reply
DCSD
Premium Member
Premium Member
Posts: 39
Joined: Tue Apr 12, 2011 8:36 am

Survivorship Stage Process

Post by DCSD »

Hello - I created a few mappings for the Entity Resolution Process all the way to the final stage of survivorship. The survivorship rules were based on the match type = MP and the longest first name, last name, middle name. I used the set_id to link the data in the survivorship table SURVIVE_NAME back to the original table that has the full set of names PERSON_NAME. Now I am trying to figure out how to handle new data coming in.

I will standardize new data but I don't know what to do in regards to matching. I have 700million records and don't want to match all of those every time I run my jobs. If I run the new data and try to match with the survived records, my set_id will change and that will change the original link between my SURVIVE_NAME table to the PERSON_NAME table. How do I maintain this relationship between the SURVIVE_NAME table to the PERSON_NAME table but also accomodate new data coming in?

I feel like this should be a standard process that people go through for Entity Resolution so I welcome any suggestions on how to handle. Thanks.
boxtoby
Premium Member
Premium Member
Posts: 138
Joined: Mon Mar 13, 2006 5:11 pm
Location: UK

Post by boxtoby »

I don't know if you have fixed this one yet, but you could try something like this:

(I assume that you have a database of 700m records to which you are trying add new records, but first check if they already exist and if they do, don't add them.)

Firstly, you need to reduce the 700m records to a new sensible number. I would do this by doing a lookup from the new data to the database on something like city name which should be fairly reliable.....

Secondly, the join back to your survive_name from the person_name table should be by a key allocated to the tables when the records were added. The set_id is always a transitory value.

This sort of exercise is all about key persistence: I would try and preserve the new data key in the target database so you can link it back and the target database key in a database holding the new data. Then you have forward and backward links.

I you have a chat to a dba, they should familiar with this sort of exercise.

Hope that helps!

Bob.
Bob Oxtoby
Post Reply