HI Experts,
I am New to Quality stage and need your help for 1 qualitystage job.
I have an address " 7104 UPPER 17TH ST N " I am trying to standardize this address using Qualitystage USADDR rule set But Streetname is populating it as "N" which shuld be Street Suffix and "UPPER" is populating in Box type field which should et popualted in Streetname. So need opinion from all of you expert.
I believe this can be done by pattern overrride. but not sure how.
But if you have address like this " 7104 N 17TH ST UPPER " Address is standardizing correctly.
Badly need help from you experts.
Thanks in advance.
Address is not standardizing correctly.
-
- Premium Member
- Posts: 425
- Joined: Sat Nov 19, 2005 9:26 am
- Location: New York City
- Contact:
Yes this can be done using Input Pattern Override. Basically the input pattern assigned to your string is coming out like:
^U>TD where ^ = 7104
U = UPPER
> = 17th
T = ST
D = N
Just open the Rules Management Windows: Standardization Rule ==> USA ==> ADDR then double-click on the red one (set USADDR)
Double-click on Overrides then Input Pattern page
Type your input pattern (^U>TD ) in the box Enter Input Pattern
In the Current Pattern List windows select the first entry in the list ( ^ ) and in the User Override select the classification for the token - in this cases House Number.
Select the second entry in the Current Pattern List windows: U and in the User Override select the classification for the token - in this cases Street Name.
Repeat the process for all token in your pattern and when done press the add button in the Override Summary windows and then press the OK button to save the changes
The Input pattern and the override sequnce typed will show up in the Override Summary Windows . If you have other patterns that you would like to override repeat the process
You can test the overrides action in the main window
^U>TD where ^ = 7104
U = UPPER
> = 17th
T = ST
D = N
Just open the Rules Management Windows: Standardization Rule ==> USA ==> ADDR then double-click on the red one (set USADDR)
Double-click on Overrides then Input Pattern page
Type your input pattern (^U>TD ) in the box Enter Input Pattern
In the Current Pattern List windows select the first entry in the list ( ^ ) and in the User Override select the classification for the token - in this cases House Number.
Select the second entry in the Current Pattern List windows: U and in the User Override select the classification for the token - in this cases Street Name.
Repeat the process for all token in your pattern and when done press the add button in the Override Summary windows and then press the OK button to save the changes
The Input pattern and the override sequnce typed will show up in the Override Summary Windows . If you have other patterns that you would like to override repeat the process
You can test the overrides action in the main window
Julio Rodriguez
ETL Developer by choice
"Sure we have lots of reasons for being rude - But no excuses
ETL Developer by choice
"Sure we have lots of reasons for being rude - But no excuses
-
- Participant
- Posts: 527
- Joined: Thu Apr 19, 2007 1:25 am
- Location: Melbourne
I looked at Julio's response and thought "That's weird. Why would he have UPPER as a Unit Type? Surely that must stuff things up, considering you now have a token containing terms that don't necessarily mean the same thing..."
You start with ^ U > T D
Line 2603 of USADDR.PAT
*U | > | [ {UnitType} = "" & {UnitValue} = "" ]
COPY_A [1] {UnitType}
COPY [2] {UnitValue}
RETYPE [1] 0
RETYPE [2] 0
So now you have
UnitType: UPPER
UnitValue: 17th
Remaining Pattern = ^ T D
- Game over. But let's see it out.
Line 3188:
^ | T | D =T= "E","N","S","W"
RETYPE [3] ^
Remaining Pattern = ^ T ^
Line 3874:
^ | T | ^ | $ | [ {StreetName} = "" ] ; Common Street Pattern Found: CALL Address_Type SUBROUTINE then EXIT
COPY [1] {HouseNumber}
COPY_A [2] {StreetPrefixType}
COPY [3] {StreetName}
CALL Address_Type
EXIT
Final score:
UnitType: UPPER
UnitValue: 17th
HouseNumber: 7104
StreetPrefixType: ST
StreetName: N
Line 2667 makes some sort of attempt:
;-------------------------------------------------
; Unit types that do not require a unit value
;-------------------------------------------------
*U =A= "BSMT","FRNT","LOWR","OFC","PH","REAR","SIDE","UPPR" | $ | [ {UnitType} = "" & {UnitValue} = "" ]
COPY_A [1] {UnitValue}
RETYPE [1] 0
But it's too late by this point. Unusual behaviours that need to be tightly controlled should be done first, before it gets a chance to be incorrectly parsed by a more general rule.
Basically, they forgot about the danger of having apples and oranges under the classification. Kinda violates the whole premise of word pattern analysis.
Having a multi type classification (see M in AUADDR.CLS and Multiple_Semantics in AUADDR.PAT) for terms like Upper, Lower, etc would have made it impossible for this to happen.
You start with ^ U > T D
Line 2603 of USADDR.PAT
*U | > | [ {UnitType} = "" & {UnitValue} = "" ]
COPY_A [1] {UnitType}
COPY [2] {UnitValue}
RETYPE [1] 0
RETYPE [2] 0
So now you have
UnitType: UPPER
UnitValue: 17th
Remaining Pattern = ^ T D
- Game over. But let's see it out.
Line 3188:
^ | T | D =T= "E","N","S","W"
RETYPE [3] ^
Remaining Pattern = ^ T ^
Line 3874:
^ | T | ^ | $ | [ {StreetName} = "" ] ; Common Street Pattern Found: CALL Address_Type SUBROUTINE then EXIT
COPY [1] {HouseNumber}
COPY_A [2] {StreetPrefixType}
COPY [3] {StreetName}
CALL Address_Type
EXIT
Final score:
UnitType: UPPER
UnitValue: 17th
HouseNumber: 7104
StreetPrefixType: ST
StreetName: N
Line 2667 makes some sort of attempt:
;-------------------------------------------------
; Unit types that do not require a unit value
;-------------------------------------------------
*U =A= "BSMT","FRNT","LOWR","OFC","PH","REAR","SIDE","UPPR" | $ | [ {UnitType} = "" & {UnitValue} = "" ]
COPY_A [1] {UnitValue}
RETYPE [1] 0
But it's too late by this point. Unusual behaviours that need to be tightly controlled should be done first, before it gets a chance to be incorrectly parsed by a more general rule.
Basically, they forgot about the danger of having apples and oranges under the classification. Kinda violates the whole premise of word pattern analysis.
Having a multi type classification (see M in AUADDR.CLS and Multiple_Semantics in AUADDR.PAT) for terms like Upper, Lower, etc would have made it impossible for this to happen.
-
- Premium Member
- Posts: 425
- Joined: Sat Nov 19, 2005 9:26 am
- Location: New York City
- Contact:
Stuart,
Agree. Having a multi type classification and a multiple_Semantics should prevent situations where you have tokens containing terms that don't necessarily mean the same thing... I found a couple of places where the rule set provided with the tool could be improved
Sometimes even having a properly designed rule set the input data is very unusual and we should make our way with user overrides, that's why they take precedence over the rule set pattern file
Would you believe that in Monty028's case "UPPER 17th ST N" is the street name! :D . I'm just guessing here - maybe Monty028 can tell us more -
Agree. Having a multi type classification and a multiple_Semantics should prevent situations where you have tokens containing terms that don't necessarily mean the same thing... I found a couple of places where the rule set provided with the tool could be improved
Sometimes even having a properly designed rule set the input data is very unusual and we should make our way with user overrides, that's why they take precedence over the rule set pattern file
Would you believe that in Monty028's case "UPPER 17th ST N" is the street name! :D . I'm just guessing here - maybe Monty028 can tell us more -
Julio Rodriguez
ETL Developer by choice
"Sure we have lots of reasons for being rude - But no excuses
ETL Developer by choice
"Sure we have lots of reasons for being rude - But no excuses
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact: