soundex function

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
Luk
Participant
Posts: 133
Joined: Thu Dec 02, 2004 8:35 am
Location: Poland
Contact:

soundex function

Post by Luk »

Hello all (after longer period)

Do you know sth. about soundex function and it usage for non english language??
Can I use it with polish data?

Regards
LUK
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Yes, it can be used with any latin alphabet based text. The algorithm for the Russell soundex is quite simple and will also work (a bit) with Polish.

The formula is to keep the first letter then to remove all a, e, h, i, o, u, w, y in the string and then to assign numbers to the remaining letters (after the first) as follows:
b, f, p, v = 1
c, g, j, k, q, s, x, z = 2
d, t = 3
l = 4
m, n = 5
r = 6

I think there are some additional rules in there but those are the most important ones to create the comparison string. Even with English this algorithm has only limited use and was derived for proper name analysis and comparison and does not attempt to really work with true pronounciation. From what little I know of Polish the names will have less vowels (which are ignored in the soundex algorithm) and more relatively more of the consonants that go into group 2.
Luk
Participant
Posts: 133
Joined: Thu Dec 02, 2004 8:35 am
Location: Poland
Contact:

Post by Luk »

thanks :)
LUK
Post Reply