Page 1 of 1

Spelling corrections in Address

Posted: Tue Mar 02, 2010 12:47 pm
by pklcnu
Dear Experts

I have three questions regarding the address cleansing

1) I have been given reference table ( from Postal Department Netherlands) which contains the standard postcodes, streetnames etc.

Is it possible to use this reference table in Quality Stage for address standardization , if so how ?

2) Is it possible to correct the streetnames if there are any spelling mistakes through above approach. If so how to do this ?

3) If I have data like "J.R. Accounts" which is not correct but the correct one is "J.R.S. Accounts" . Is it possible to cleanse this type of data ?


The software that we have doesn't have ruleset for Netherlands and I have been asked to use the reference table as mentioned above.

Any help and ideas will be much appreciated.

Many thanks in advance

Posted: Tue Mar 02, 2010 2:57 pm
by ray.wurlod
1) Yes, but it would be easier once you build NLADDR and NLAREA (and maybe NLPREP) rule sets.

2) Typically identify potential duplicates through matching on phonetic equivalent and as much other information as you have available, and specify that the reference table is the accurate one.

3) This is your NLNAME rule set (though any "name" rule set will probably work). Again the technique is matching on phonetic equivalent and as much other information as you have available. Now, though, you need a Survivorship rule to specify which one is correct.

Re: Spelling corrections in Address

Posted: Wed Mar 03, 2010 2:37 am
by stuartjvnorton
pklcnu wrote:Dear Experts

I have three questions regarding the address cleansing

1) I have been given reference table ( from Postal Department Netherlands) which contains the standard postcodes, streetnames etc.

Is it possible to use this reference table in Quality Stage for address standardization , if so how ?

2) Is it possible to correct the streetnames if there are any spelling mistakes through above approach. If so how to do this ?

3) If I have data like "J.R. Accounts" which is not correct but the correct one is "J.R.S. Accounts" . Is it possible to cleanse this type of data ?


The software that we have doesn't have ruleset for Netherlands and I have been asked to use the reference table as mentioned above.

Any help and ideas will be much appreciated.

Many thanks in advance

1) You need to create your own ruleset(s) in order to do proper parse and standardisation in QS. You could use these reference tables in your classification file.
Maybe you can modify DEAREA, DEADDR if the basic structure is close enough to German addresses (and I have no idea so don't quote me).

2) Correction and standardisation are 2 rather different things.
It's easy to say Avenue = Av = Ave for standard terms (and the parse part helps to tell you if it's a standard term or just a word), but if you start changing things like the street name, then it should be because your reference files give you a way to know, or make one hell of a guess.
Depends what you are getting in your "etc". ;-)

3) If you have a reference set that has the full list of correct values, then you could try to match against it to find the most likely option.

Cleanse, correct, stan, everything you're talking about is based on understanding context and having good enough reference data to either know for sure or make the best guess we can.
If you don't have good enough reference data, chances are your guesses won't be good enough either. ;-)

Hope this helps. :-)

Posted: Wed Mar 03, 2010 4:53 am
by JoshGeorge
Have you tried the MNS and then doing a refernce to the given tables? For spelling mistake correction a soundex matchig might be an easy way.

Posted: Wed Mar 03, 2010 11:56 am
by pklcnu
Thanks for your suggestions.....will let you know the outcome soon.........
JoshGeorge wrote:Have you tried the MNS and then doing a refernce to the given tables? For spelling mistake correction a soundex matchig might be an easy way.