QualityStage does have Soundex, but you are not using it as your blocking columns. Did you intend to? Soundex is Soundex, no matter which vendor it comes from; the algorithm is in the public domain.
NYSIIS is a more sophisticated algorithm than Soundex (primarily because it looks at more characters and does not lose the vowels), and could not reasonably be expected to generate the same results as Soundex - otherwise why bother having two algorithms at all?
Data Matching: QualityStage vs Oracle’s Soundex
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Participant
- Posts: 4
- Joined: Mon Sep 29, 2008 12:41 am
- Location: Manila, Philippines
- Contact:
Hi,
In our version (8.0.1) there is no Soundex, we only have Reverse Soundex (RVSNDX) and NYSIIS. (I will ask the IBM support regarding this).
I didn't use that because there are Last Name ends with different letter e.g. "Fernando Gonzales" should match with "Fernando Gonzalez"
I use QS to get the most accurate results. And I use Soundex for testing purposes only. Soundex result sometimes not that accurate e.g. "John Garcia" match with "Jane Garso".
In my understanding, the "Residual" are the unique records. But when I tested the result (using the Soundex), I found several duplicates e.g.
Customer# FirstName LastName State
752 Michelle Cortel Texas
254 Michelle Cortel Texas
Even though I dont use the soundex, I could find duplicates in Residual.
Thanks!
In our version (8.0.1) there is no Soundex, we only have Reverse Soundex (RVSNDX) and NYSIIS. (I will ask the IBM support regarding this).
I didn't use that because there are Last Name ends with different letter e.g. "Fernando Gonzales" should match with "Fernando Gonzalez"
I use QS to get the most accurate results. And I use Soundex for testing purposes only. Soundex result sometimes not that accurate e.g. "John Garcia" match with "Jane Garso".
In my understanding, the "Residual" are the unique records. But when I tested the result (using the Soundex), I found several duplicates e.g.
Customer# FirstName LastName State
752 Michelle Cortel Texas
254 Michelle Cortel Texas
Even though I dont use the soundex, I could find duplicates in Residual.
Thanks!
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
You DO have Soundex - it's just that the rule set you're using does not use it to generate any matching field. You could create a custom rule set that does use it, if that's what you want.marionpt wrote:In our version (8.0.1) there is no Soundex, we only have Reverse Soundex (RVSNDX) and NYSIIS.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 527
- Joined: Thu Apr 19, 2007 1:25 am
- Location: Melbourne
Yeah, seems kind of pointless.cristty wrote:Are you sure it's a good idea to have all 3 columns both present for blocking and for matching?
Blocking fields should be there to weed out the junk, and matching fields are to differentiate the ones that pass the block by giving them a score.
If you match on exactly the same fields as the block, then everything that makes it through the match pass will be an exact match (on the loose key) and the only score differentiation will be because of less common sounding terms as per the frequency analysis.
Even if the blocking fields and matching fields aren't the same, you want to check against actual values (or tight keys) in the match pass, so it can score the ones that are exact (or almost) higher than the ones that just sneak through via the NYSIIS.
Exact checking on loose keys = blocking
Fuzzy checks on loose keys = minimal matching value
Fuzzy checks on exact values = better matching value