At a basic level, you could take the data that parses out correctly and put it back into the correct fields in a systematic manner.Murali4u wrote:Thanks Ray and Stuart. I have done all the things that you both have sorted out,but out of my curiosity i'm just asking how to cleanse and correct the data values. thats it. Thanks for your comments bro![]()
This is providing it was parsed correctly, of course. You can only know that by looking at both the stuff that "passes" along with the stuff that "fails".
Junk can pass if it manages to satisfy the pattern, or partial patterns strip the wrong bits out because the available patterns don't cover the patterns in your data.
Good stuff can fail because the ruleset doesn't have a pattern for it, or a word isn't in a classification table making it fall into the wrong pattern.
If it fails when it shouldn't or passes when it shouldn't, you can try to work on the ruleset.
All I'll say about that is to do some analysis on the patterns that pass and fail incorrectly, look for common types to work off and plan a way to attack enough of it to make the results acceptable. You could write a book on the actually doing it. More of the "art" that you can't get over an email.That's just experience getting in and doing it.
For the stuff that fails and is "supposed to" (and remember that's not entirely up to you or QS either), then that's really up to the business to guide you on what to do.
Do you guess? (don't recommend that one
![Wink ;-)](./images/smilies/icon_wink.gif)
Do you try to do some sort of matching to a reference set in order to "correct" it automatically?
Do you leave it alone?
Do you just drop it?
Do you get them to fix it themselves?
Remember, it's their data and they get to decide.