Page 1 of 1

Phonetic Code Generation

Posted: Sat Feb 23, 2013 5:09 am
by nilanjan
Hi,

My requirement is to generate reverse soundex phonetic code for a complete string say, Byblos Restaurent. Is there any function ? How can i do it?

Posted: Sat Feb 23, 2013 2:34 pm
by ray.wurlod
It's possible, but only if you're creating your own PAL script. Is this the case?

Posted: Sun Feb 24, 2013 4:23 pm
by stuartjvnorton
That's an odd requirement to be given.
Let's step back from the "requirement" for a minute. What is the problem? Is there another way to fulfill it?

As you have seen, Soundex or RSoundex have limitations. Even more than the length issue, they are a little bit loose for some tastes.
Have you tried RNYSIIS? It's both longer and a bit more comprehensive phonetically than RSoundex. What region is your data from? A lot of the "standard" phonetic algorithms work best (if at all) with names where English is the primary language. If you have other needs, you'll have to do as Ray said, but using an algorithm that makes more sense for the data.

Are you standardising the name first?
I suggest you do for a couple of reasons:
- "Fix" some spelling errors and standardise full word vs abbreviations etc
- The output of the standard name rulesets split the name into the important words and individually does NYSIIS and RSOUNDEX on them. Might remove the "requirement".
- Nicknames: Bob and Robert won't match but most likely should.
- Gives you the option to check the phonetic keys in varying orders that can allow you to match names where the word order is a little swapped around.

Posted: Mon Feb 25, 2013 1:18 am
by nilanjan
Hi Ray,stuart,

My requirement is : we have a base table which have say,12 million data.Now we r getting delta records which we need to compare with the base table data on DBA NAME and STREET NAME by phonetic code match.The matching part will be done by other tool(AB INITIO).My concern is to generate the phonetic code only.What they want to generate phonetic codes of DBA NAME and STREET NAME.In case of STREET NAME,if i use soundex function it will not generate proper code.say,'2825 N SCOTTSDALE RD' and '25 N SCOTTSDALE RD' is generating same code.Which is not approprate.I am new to datastage and really don't have any idea on that.

What is PAL script?

Posted: Mon Feb 25, 2013 7:17 am
by stuartjvnorton
If you have a complete address and just want the street name, you'll need to parse it first. Check out lesson 2 of the QualityStage tute: it will do what you want it to.

If you do want some version of the whole address for matching, then maybe a single phonetic value is neither doable nor sufficient.

Posted: Mon Feb 25, 2013 8:26 am
by chulett
Not any kind of helpful, I know, but I find it amusing that you are using one ETL tool to feed another. Seems to me either one should be able to do the whole task. [shrug]

Posted: Mon Feb 25, 2013 2:32 pm
by ray.wurlod
You can do all of this with QualityStage - no need to use Ab Initio. Maybe Ab Initio can do it all too - that I don't know.

Posted: Tue Feb 26, 2013 7:36 am
by nilanjan
chulett wrote:Not any kind of helpful, I know, but I find it amusing that you are using one ETL tool to feed another. Seems to me either one should be able to do the whole task. [shrug]
Actually i m completely new to datastage and it is new also in my project.We have also huge data(about 12 million).So i m afraid about the performance of dsqs.Is there any posibility of performance issue?

Posted: Tue Feb 26, 2013 8:21 am
by BI-RMA
Is there any posibility of performance issue?
Actually - given your DS-Server has got sufficient resources - No.

Posted: Tue Feb 26, 2013 8:30 am
by chulett
I would say there is always a possibility. That possibility goes up with any tool if you are new to it and there's no-one onsite to mentor you. However, I wouldn't let that stop you... why solve only half the problem? :wink: