Page 1 of 1

Manipulate standardization rule

Posted: Tue Mar 05, 2013 8:27 am
by nilanjan
Hi,
I want to manipulate standardization output columns.Can i do that? Suppose i want to generate RVSNDX column for Primaryword1,Primaryword2 and Primaryword3 together.Can i do that by concatinating all three columns?

Posted: Tue Mar 05, 2013 1:14 pm
by ray.wurlod
Yes you can, but it's probably a waste of time. RVSNDX (and Soundex) will only look at 4-6 characters.

Posted: Wed Mar 06, 2013 12:17 am
by ray.wurlod
For those particular combinations you are correct. It is not possible to assert that you are generally correct. Other words are similar in the right hand end, particularly names of corporate entities.

Posted: Wed Mar 06, 2013 6:30 pm
by stuartjvnorton
You're obviously not using QualityStage to do this soundex. Soundex in a Transformer stage?

If you want a soundex that works on longer strings, you'll have to write it yourself. Though why you would, I'm not sure: phonetically it's quite loose, and it also falls down where the first letter of the strings differ.
So KrispyKreeme and CrispyKreeme will never match, regardless of how long you make the key.

If you want to use the ruleset properly (and back towards the original queation), you'll have to look at the fields you are given and understand what it does. If you need to, you can change the PAT and DCT files to add RVSNDX and NYSIIS fields to MatchPrimaryName3 (and 4 and 5 if need be) as well.

Posted: Wed Mar 06, 2013 10:58 pm
by nilanjan
Stuart,
Yes u r right.I want to implement more powerful phonetic algorithm(reversesoundex,metaphone,double metaphone etc.) but as i m new to this tool,i really don't understand how to do that.I never change PAT or DCT files.It is very tricky to change anything in those files i guess.Can u just describe in detail how i can do that?

Posted: Wed Mar 06, 2013 10:59 pm
by nilanjan
ray.wurlod wrote:For those particular combinations you are correct. It is not possible to assert that you are generally correct. Other words are similar in the right hand end, particularly names of corporate entitie ...
Ray,
As i m not a premium user,i am unable to see your complete reply.

Posted: Wed Mar 06, 2013 11:33 pm
by stuartjvnorton
You can't change the phonetic algorithms that are used in the QS rulesets.
They are part of the PAL.

The DCT file is just the output metadata for the ruleset. The QS user guide will explain it to you. As for the PAT file, read the Pattern Action Language Reference to understand what is in there. If I was doing it, I'd look to how it currently populates MatchPrimaryWord2NYSIIS and MAtchPrimaryWord2RVSNDX, and apply the same logic to MatchPrimaryWord3.

You would be able to write your own custom function in C/C++ to implement any phonetic algorithm and use it from within a transformer stage (although some like Double Metaphone may produce 2 output strings, so that will affect how you will use it). That is not QualityStage, however.