Address is not standardizing correctly.

Infosphere's Quality Product

Moderators: chulett, rschirm

Post Reply
monty028
Participant
Posts: 3
Joined: Wed Feb 03, 2010 2:07 pm

Address is not standardizing correctly.

Post by monty028 »

HI Experts,

I am New to Quality stage and need your help for 1 qualitystage job.

I have an address " 7104 UPPER 17TH ST N " I am trying to standardize this address using Qualitystage USADDR rule set But Streetname is populating it as "N" which shuld be Street Suffix and "UPPER" is populating in Box type field which should et popualted in Streetname. So need opinion from all of you expert.

I believe this can be done by pattern overrride. but not sure how.
But if you have address like this " 7104 N 17TH ST UPPER " Address is standardizing correctly.

Badly need help from you experts.

Thanks in advance.
JRodriguez
Premium Member
Premium Member
Posts: 425
Joined: Sat Nov 19, 2005 9:26 am
Location: New York City
Contact:

Post by JRodriguez »

Yes this can be done using Input Pattern Override. Basically the input pattern assigned to your string is coming out like:

^U>TD where ^ = 7104
U = UPPER
> = 17th
T = ST
D = N


Just open the Rules Management Windows: Standardization Rule ==> USA ==> ADDR then double-click on the red one (set USADDR)
Double-click on Overrides then Input Pattern page

Type your input pattern (^U>TD ) in the box Enter Input Pattern
In the Current Pattern List windows select the first entry in the list ( ^ ) and in the User Override select the classification for the token - in this cases House Number.
Select the second entry in the Current Pattern List windows: U and in the User Override select the classification for the token - in this cases Street Name.
Repeat the process for all token in your pattern and when done press the add button in the Override Summary windows and then press the OK button to save the changes

The Input pattern and the override sequnce typed will show up in the Override Summary Windows . If you have other patterns that you would like to override repeat the process

You can test the overrides action in the main window
Julio Rodriguez
ETL Developer by choice

"Sure we have lots of reasons for being rude - But no excuses
stuartjvnorton
Participant
Posts: 527
Joined: Thu Apr 19, 2007 1:25 am
Location: Melbourne

Post by stuartjvnorton »

I looked at Julio's response and thought "That's weird. Why would he have UPPER as a Unit Type? Surely that must stuff things up, considering you now have a token containing terms that don't necessarily mean the same thing..."

You start with ^ U > T D

Line 2603 of USADDR.PAT
*U | > | [ {UnitType} = "" & {UnitValue} = "" ]
COPY_A [1] {UnitType}
COPY [2] {UnitValue}
RETYPE [1] 0
RETYPE [2] 0

So now you have
UnitType: UPPER
UnitValue: 17th

Remaining Pattern = ^ T D

- Game over. But let's see it out.


Line 3188:
^ | T | D =T= "E","N","S","W"
RETYPE [3] ^

Remaining Pattern = ^ T ^


Line 3874:
^ | T | ^ | $ | [ {StreetName} = "" ] ; Common Street Pattern Found: CALL Address_Type SUBROUTINE then EXIT
COPY [1] {HouseNumber}
COPY_A [2] {StreetPrefixType}
COPY [3] {StreetName}
CALL Address_Type
EXIT


Final score:

UnitType: UPPER
UnitValue: 17th
HouseNumber: 7104
StreetPrefixType: ST
StreetName: N


Line 2667 makes some sort of attempt:
;-------------------------------------------------
; Unit types that do not require a unit value
;-------------------------------------------------

*U =A= "BSMT","FRNT","LOWR","OFC","PH","REAR","SIDE","UPPR" | $ | [ {UnitType} = "" & {UnitValue} = "" ]
COPY_A [1] {UnitValue}
RETYPE [1] 0

But it's too late by this point. Unusual behaviours that need to be tightly controlled should be done first, before it gets a chance to be incorrectly parsed by a more general rule.

Basically, they forgot about the danger of having apples and oranges under the classification. Kinda violates the whole premise of word pattern analysis.

Having a multi type classification (see M in AUADDR.CLS and Multiple_Semantics in AUADDR.PAT) for terms like Upper, Lower, etc would have made it impossible for this to happen.
JRodriguez
Premium Member
Premium Member
Posts: 425
Joined: Sat Nov 19, 2005 9:26 am
Location: New York City
Contact:

Post by JRodriguez »

Stuart,

Agree. Having a multi type classification and a multiple_Semantics should prevent situations where you have tokens containing terms that don't necessarily mean the same thing... I found a couple of places where the rule set provided with the tool could be improved

Sometimes even having a properly designed rule set the input data is very unusual and we should make our way with user overrides, that's why they take precedence over the rule set pattern file

Would you believe that in Monty028's case "UPPER 17th ST N" is the street name! :D . I'm just guessing here - maybe Monty028 can tell us more -
Julio Rodriguez
ETL Developer by choice

"Sure we have lots of reasons for being rude - But no excuses
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Good points. Just remember that the two "unhandled" user overrides are applied after the pattern action language script has been executed.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply