Address Cleansing

Infosphere's Quality Product

Moderators: chulett, rschirm

stuartjvnorton
Participant
Posts: 527
Joined: Thu Apr 19, 2007 1:25 am
Location: Melbourne

Post by stuartjvnorton »

Murali4u wrote:Thanks Ray and Stuart. I have done all the things that you both have sorted out,but out of my curiosity i'm just asking how to cleanse and correct the data values. thats it. Thanks for your comments bro :) :)
At a basic level, you could take the data that parses out correctly and put it back into the correct fields in a systematic manner.

This is providing it was parsed correctly, of course. You can only know that by looking at both the stuff that "passes" along with the stuff that "fails".

Junk can pass if it manages to satisfy the pattern, or partial patterns strip the wrong bits out because the available patterns don't cover the patterns in your data.
Good stuff can fail because the ruleset doesn't have a pattern for it, or a word isn't in a classification table making it fall into the wrong pattern.

If it fails when it shouldn't or passes when it shouldn't, you can try to work on the ruleset.
All I'll say about that is to do some analysis on the patterns that pass and fail incorrectly, look for common types to work off and plan a way to attack enough of it to make the results acceptable. You could write a book on the actually doing it. More of the "art" that you can't get over an email.That's just experience getting in and doing it.

For the stuff that fails and is "supposed to" (and remember that's not entirely up to you or QS either), then that's really up to the business to guide you on what to do.
Do you guess? (don't recommend that one ;-))
Do you try to do some sort of matching to a reference set in order to "correct" it automatically?
Do you leave it alone?
Do you just drop it?
Do you get them to fix it themselves?
Remember, it's their data and they get to decide.
JoshGeorge
Participant
Posts: 612
Joined: Thu May 03, 2007 4:59 am
Location: Melbourne

Re: Address Cleansing

Post by JoshGeorge »

xxPREP will help you to separate your country addresses from others. Post this the rule sets (xxAREA & xxADDR) will help you to form the address as area domain & address domain according to the fields you have passed into the ZQ sections. Now to really check this address against your country address database you will have to lookup there using the QualityStage Module for that country which will validate and correct addresses and transforms them into layouts that conform to standards.

pklcnu wrote:Dear Experts

I need to do address cleansing, the table is having different columns for address like HouseNumber, StreetName , Location, PostCode .

The data in these columns is all mixed up, the following are some of the cases how the data is entered in the columns
1) In the HouseNumber column full address is entered
2) The house number and street name are mixed up in one or the other column
3) The house number is present in both HouseNumber column and StreetName columns
4) Some of the values in the columns are entered correctly

How exactly I should clean this type of data ? And what stages I should use for this and how ?

Please help me with this , Many thanks
Joshy George
<a href="http://www.linkedin.com/in/joshygeorge1" ><img src="http://www.linkedin.com/img/webpromo/bt ... _80x15.gif" width="80" height="15" border="0"></a>
Post Reply