How to hande addresses with Puerto Rican token
How to hande addresses with Puerto Rican token
using the USADDR ruleset I am find addresses that contain a Puerto Rican (P) token somewhat problematic. for example:
122 Villa Oaks street
has a token pattern of ^P+T sends the entire address to UnhandledData with an Unhandled Pattern of ^P++
I realize I could fix this by adding token patterns of ^P+T and ^+PT to the rule set but am not sure that is the best solution.
Another option would be to remove the most common Puerto Rican words we see in our data (Villa and Casa) from the classifications file so they no longer are classified as Puerto Rican words.
Neither of these options seem to be thorough enough to be ideal.
Are there better options?
122 Villa Oaks street
has a token pattern of ^P+T sends the entire address to UnhandledData with an Unhandled Pattern of ^P++
I realize I could fix this by adding token patterns of ^P+T and ^+PT to the rule set but am not sure that is the best solution.
Another option would be to remove the most common Puerto Rican words we see in our data (Villa and Casa) from the classifications file so they no longer are classified as Puerto Rican words.
Neither of these options seem to be thorough enough to be ideal.
Are there better options?
Bob
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Classification overrides for VILLA and CASA would seem most appropriate to me. Change the classification of these two words to ?
If you don't like to do that, try an Input Pattern override to process the ^P+T pattern to street number, street name, street name, street type.
If you don't like to do that, try an Input Pattern override to process the ^P+T pattern to street number, street name, street name, street type.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
I could not find a way to enter ? as the classification so I chose + Unknown word for both Casa and Villa.
Testing though still leaves me confused:
234 CASA BONITA CIRCLE results in:
UnhandledPattern = ^++T
UnhandledData = 234 CASA BONITA CIRCLE
InputPattern = ^++T
234 CAS BONITA CIRCLE results in:
HouseNumber = 234
Street Name = CAS BONITA
StreetSuffixType - CIR
InputPattern = ^++T
Similar results are seen for Villa Oaks Street and Elm Oaks Street
How is it that the same input pattern can be both handled correctly and unhandled? What am I missing?
Testing though still leaves me confused:
234 CASA BONITA CIRCLE results in:
UnhandledPattern = ^++T
UnhandledData = 234 CASA BONITA CIRCLE
InputPattern = ^++T
234 CAS BONITA CIRCLE results in:
HouseNumber = 234
Street Name = CAS BONITA
StreetSuffixType - CIR
InputPattern = ^++T
Similar results are seen for Villa Oaks Street and Elm Oaks Street
How is it that the same input pattern can be both handled correctly and unhandled? What am I missing?
Bob
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
What kind of override did you use?
Classification, Input Text and Input Pattern overrides are applied before the pattern action language; Unhandled Text and Unhandled Pattern are applied afterwards.
Classification, Input Text and Input Pattern overrides are applied before the pattern action language; Unhandled Text and Unhandled Pattern are applied afterwards.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 527
- Joined: Thu Apr 19, 2007 1:25 am
- Location: Melbourne
Sounds like it should do what you want.
Have you tried saving and provisioning the ruleset? I know it sounds dumb and the same as rebooting Windows whenever something weird happens, but I have seen it do the trick sometimes.
Otherwise, try it as an IPO like you suggested up front. You reclassifying the 2 tokens will change how the ruleset processes valid PR input patterns that contain tose tokens, even when they aren't related to your specific examples.
An IPO will have less impact on the rest of the ruleset, which in some situations specifically looks for CASA and VILLA to do certain PR processing.
Have you tried saving and provisioning the ruleset? I know it sounds dumb and the same as rebooting Windows whenever something weird happens, but I have seen it do the trick sometimes.
Otherwise, try it as an IPO like you suggested up front. You reclassifying the 2 tokens will change how the ruleset processes valid PR input patterns that contain tose tokens, even when they aren't related to your specific examples.
An IPO will have less impact on the rest of the ruleset, which in some situations specifically looks for CASA and VILLA to do certain PR processing.
I'm glad to hear that what I did SHOULD have worked because I thought I was beginning to understand it. However, it is not working even after doing the provision all.
As a workaround I removed the words "casa" and "villa" from the classification file in the set and that seems to have had the desired effect. Am not sure why that works but reclassifying them in the IPO does not.
I'm considering opening a PMR.
As a workaround I removed the words "casa" and "villa" from the classification file in the set and that seems to have had the desired effect. Am not sure why that works but reclassifying them in the IPO does not.
I'm considering opening a PMR.
Bob
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Read through the PAT file for the rule set - see if (where) CASA and VILLA are explicitly handled. See if you can pre-empt that in the way that Stuart suggested.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 612
- Joined: Thu May 03, 2007 4:59 am
- Location: Melbourne
Use "T-Street Types" as classification for CASA (With standard form CASA) and VILLA (With standard form VILLA). This should do the job.
Joshy George
<a href="http://www.linkedin.com/in/joshygeorge1" ><img src="http://www.linkedin.com/img/webpromo/bt ... _80x15.gif" width="80" height="15" border="0"></a>
<a href="http://www.linkedin.com/in/joshygeorge1" ><img src="http://www.linkedin.com/img/webpromo/bt ... _80x15.gif" width="80" height="15" border="0"></a>