Page 1 of 1

soundex function

Posted: Mon Mar 20, 2006 3:19 am
by Luk
Hello all (after longer period)

Do you know sth. about soundex function and it usage for non english language??
Can I use it with polish data?

Regards

Posted: Mon Mar 20, 2006 3:32 am
by ArndW
Yes, it can be used with any latin alphabet based text. The algorithm for the Russell soundex is quite simple and will also work (a bit) with Polish.

The formula is to keep the first letter then to remove all a, e, h, i, o, u, w, y in the string and then to assign numbers to the remaining letters (after the first) as follows:
b, f, p, v = 1
c, g, j, k, q, s, x, z = 2
d, t = 3
l = 4
m, n = 5
r = 6

I think there are some additional rules in there but those are the most important ones to create the comparison string. Even with English this algorithm has only limited use and was derived for proper name analysis and comparison and does not attempt to really work with true pronounciation. From what little I know of Polish the names will have less vowels (which are ignored in the soundex algorithm) and more relatively more of the consonants that go into group 2.

Posted: Mon Mar 20, 2006 3:42 am
by Luk
thanks :)