How to hande addresses with Puerto Rican token

Infosphere's Quality Product

Moderators: chulett, rschirm

Post Reply
bobyon
Premium Member
Premium Member
Posts: 200
Joined: Tue Mar 02, 2004 10:25 am
Location: Salisbury, NC

How to hande addresses with Puerto Rican token

Post by bobyon »

using the USADDR ruleset I am find addresses that contain a Puerto Rican (P) token somewhat problematic. for example:

122 Villa Oaks street

has a token pattern of ^P+T sends the entire address to UnhandledData with an Unhandled Pattern of ^P++

I realize I could fix this by adding token patterns of ^P+T and ^+PT to the rule set but am not sure that is the best solution.

Another option would be to remove the most common Puerto Rican words we see in our data (Villa and Casa) from the classifications file so they no longer are classified as Puerto Rican words.

Neither of these options seem to be thorough enough to be ideal.

Are there better options?
Bob
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Not really, but a user override (or two) may be easier than removing anything from the classification table.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
bobyon
Premium Member
Premium Member
Posts: 200
Joined: Tue Mar 02, 2004 10:25 am
Location: Salisbury, NC

Post by bobyon »

Thanks for the suggestion Ray. However, being the QS newbie that I am could you please elaborate on what we could do in the overrides that would help.
Bob
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Classification overrides for VILLA and CASA would seem most appropriate to me. Change the classification of these two words to ?

If you don't like to do that, try an Input Pattern override to process the ^P+T pattern to street number, street name, street name, street type.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
bobyon
Premium Member
Premium Member
Posts: 200
Joined: Tue Mar 02, 2004 10:25 am
Location: Salisbury, NC

Post by bobyon »

I could not find a way to enter ? as the classification so I chose + Unknown word for both Casa and Villa.

Testing though still leaves me confused:

234 CASA BONITA CIRCLE results in:
UnhandledPattern = ^++T
UnhandledData = 234 CASA BONITA CIRCLE
InputPattern = ^++T


234 CAS BONITA CIRCLE results in:
HouseNumber = 234
Street Name = CAS BONITA
StreetSuffixType - CIR
InputPattern = ^++T

Similar results are seen for Villa Oaks Street and Elm Oaks Street

How is it that the same input pattern can be both handled correctly and unhandled? What am I missing?
Bob
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

What kind of override did you use?
Classification, Input Text and Input Pattern overrides are applied before the pattern action language; Unhandled Text and Unhandled Pattern are applied afterwards.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
bobyon
Premium Member
Premium Member
Posts: 200
Joined: Tue Mar 02, 2004 10:25 am
Location: Salisbury, NC

Post by bobyon »

I used Classification override
Bob
stuartjvnorton
Participant
Posts: 527
Joined: Thu Apr 19, 2007 1:25 am
Location: Melbourne

Post by stuartjvnorton »

Sounds like it should do what you want.
Have you tried saving and provisioning the ruleset? I know it sounds dumb and the same as rebooting Windows whenever something weird happens, but I have seen it do the trick sometimes.

Otherwise, try it as an IPO like you suggested up front. You reclassifying the 2 tokens will change how the ruleset processes valid PR input patterns that contain tose tokens, even when they aren't related to your specific examples.
An IPO will have less impact on the rest of the ruleset, which in some situations specifically looks for CASA and VILLA to do certain PR processing.
bobyon
Premium Member
Premium Member
Posts: 200
Joined: Tue Mar 02, 2004 10:25 am
Location: Salisbury, NC

Post by bobyon »

I'm glad to hear that what I did SHOULD have worked because I thought I was beginning to understand it. However, it is not working even after doing the provision all.

As a workaround I removed the words "casa" and "villa" from the classification file in the set and that seems to have had the desired effect. Am not sure why that works but reclassifying them in the IPO does not.

I'm considering opening a PMR.
Bob
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Read through the PAT file for the rule set - see if (where) CASA and VILLA are explicitly handled. See if you can pre-empt that in the way that Stuart suggested.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
JoshGeorge
Participant
Posts: 612
Joined: Thu May 03, 2007 4:59 am
Location: Melbourne

Post by JoshGeorge »

Use "T-Street Types" as classification for CASA (With standard form CASA) and VILLA (With standard form VILLA). This should do the job.
Joshy George
<a href="http://www.linkedin.com/in/joshygeorge1" ><img src="http://www.linkedin.com/img/webpromo/bt ... _80x15.gif" width="80" height="15" border="0"></a>
Post Reply