Soundex() logic

Archive of postings to DataStageUsers@Oliver.com. This forum intended only as a reference and cannot be posted to.

Moderators: chulett, rschirm

Locked
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

Soundex() logic

Post by admin »

Dear all

Can any one enlighten me about the logic used by Soundex() function for generating the codes?

Regards

> Asim Munshi
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> Consultant - Business Intelligence,
> Sonata Software Limited,
> 193, R V Road
> Basavangudi, Bangalore - 560004.
> * 91-80-6567492,6567497 - ext 2772
> Fax - 91 - 080 - 6567487
Website : http://www.sonata-software.com
Alt. email : asim_munshi@yahoo.com ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


*********************************************************************
Disclaimer: The information in this e-mail and any attachments is confidential / privileged. It is intended solely for the addressee or addressees. If you are not the addressee indicated in this message, you may not copy or deliver this message to anyone. In such case, you should destroy this message and kindly notify the sender by reply email. Please advise immediately if you or your employer does not consent to Internet email for messages of this kind.
*********************************************************************
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

Post by admin »

Hi Asim

I use the Soundex function to conform the data to a standard code and name. The business I work for has several systems. Each system has a different code for the same business unit, as well as, a slightly different name. Soundex has worked well because it has eliminated "pain in the neck" maintenance of translation tables. Briefly, here is what I do:

1. Build a hash file, the key being the Soundex code that is generated e.g. B653 is generated from Brenthust Rehabilitation Clinic. This hash file contains the conformed code and name for the business unit, as well as, the soundex code which is the key.

2. Read the data source that contains a code and name e.g. Brenthurst Rehab Clinic

3. Do a lookup on the hash file using the soundex function e.g.
Soundex(ClinicName)

Hope this helps explain Soundex...

Mark





*********************************************************************
This footnote confirms that this e-mail message has been scanned for the presence of known computer viruses by the MessageLabs Virus
Control Centre. However, it is still recommended that you use local virus scanning software to monitor for the presence of viruses.
*********************************************************************
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

Post by admin »

Asim,

Keep in mind that Soundex does not and is not intended to generate unique keys. In fact, its intention is quite the reverse. The purpose of Soundex is to generate a common code for names that sound similar. Normally, you would present this list to a user, so that they can choose a name from a list of names sounding similar to what they typed. This means that they dont have to get the spelling exactly right to find the name.

In a warehouse environment? We dont use it, but it is a common algorithm for matching similar sounding names. At the very least, it is useful for DataStage to be able to generate these for storage in a warehouse so that other interactive tools could use the stored code.

As Mark describes below, it could have some application in automated data matching, but I suspect its use would really be limited to finding a list of potential matches.

As for the logic it uses, I dont recall the precise detail (it has been around for a very long time), but it is something like this.

* Take the first letter of the name as the first letter of the code.
* Remove all remaining vowels from the name
* Deal with double letters, ph, gh combinations etc
* The next 3 remaining consonants are now grouped into 6 (I think) groups which sound similar. These groups are simply numbered 1 to 6.
* The 3 digits of the code following the initial letter represent these next 3 consonants based on their grouping. If there arent 3 remaining consonants, use 0 for the missing ones.

Thats approximately it in a nutshell. Im sure a web search would turn up the precise specification. As I said, it has been around for a very long time.


-----Original Message-----
From: Ewart-Phipps,Mark [mailto:Mark.Ewart-Phipps@afrox.boc.com]
Sent: Monday, 12 November 2001 8:43 PM
To: datastage-users@oliver.com
Subject: RE: Soundex() logic

Hi Asim

I use the Soundex function to conform the data to a standard code and name. The business I work for has several systems. Each system has a different code for the same business unit, as well as, a slightly different name. Soundex has worked well because it has eliminated "pain in the neck" maintenance of translation tables. Briefly, here is what I do:

1. Build a hash file, the key being the Soundex code that is generated e.g. B653 is generated from Brenthust Rehabilitation Clinic. This hash file contains the conformed code and name for the business unit, as well as, the soundex code which is the key.

2. Read the data source that contains a code and name e.g. Brenthurst Rehab Clinic

3. Do a lookup on the hash file using the soundex function e.g.
Soundex(ClinicName)

Hope this helps explain Soundex...

Mark





*********************************************************************
This footnote confirms that this e-mail message has been scanned for the presence of known computer viruses by the MessageLabs Virus Control Centre. However, it is still recommended that you use local virus scanning software to monitor for the presence of viruses.
*********************************************************************
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

Post by admin »

I found this:

http://www.bluepoof.com/Soundex/info.html


-----Original Message-----
From: Asimkumar A. Munshi. [mailto:masi@sonata-software.com]
Sent: Monday, November 12, 2001 3:48 AM
To: datastage-users@oliver.com
Subject: Soundex() logic


Dear all

Can any one enlighten me about the logic used by Soundex() function for generating the codes?

Regards

> Asim Munshi
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> Consultant - Business Intelligence,
> Sonata Software Limited,
> 193, R V Road
> Basavangudi, Bangalore - 560004.
> * 91-80-6567492,6567497 - ext 2772
> Fax - 91 - 080 - 6567487
Website : http://www.sonata-software.com
Alt. email : asim_munshi@yahoo.com ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


*********************************************************************
Disclaimer: The information in this e-mail and any attachments is confidential / privileged. It is intended solely for the addressee or addressees. If you are not the addressee indicated in this message, you may not copy or deliver this message to anyone. In such case, you should destroy this message and kindly notify the sender by reply email. Please advise immediately if you or your employer does not consent to Internet email for messages of this kind.
*********************************************************************
Locked