parsing address informaiton

hobocamp · Post by **hobocamp** » Wed Feb 04, 2009 6:16 pm

I've searched a lot for this, and haven't found anything, though I would imagine it's been a common question so here goes -

I have some trimmed address fields containing city, state , zip info (exp: MilwaukeeWI54621), where I need to parse the values out into separate fields. After working various functions (Index, Len, etc.) I haven't come up with a good way to do it. Could anyone kindly offer any tips or suggestions?

Thanks in advance.

Tom

ray.wurlod · Post by **ray.wurlod** » Wed Feb 04, 2009 6:23 pm

QualityStage has this functionality straight out of the box, and can handle misspelled entries as well. You would simply insert a standardization stage that uses the USAREA rule set into your job.

If you don't have QualityStage licensed then the MatchField() function may stand you in good stead in server jobs. For example for the data you have provided:

Code: Select all

City    MatchField(InLink.Address, "0X2A5N", 1)
State   MatchField(InLink.Address, "0X2A5N", 2)
Zip     MatchField(InLink.Address, "0X2A5N", 3)

JRodriguez · Post by **JRodriguez** » Wed Feb 04, 2009 10:01 pm

Tom,

If all your records follows same standard my suggestion is to parse them using a combination of Oconv, Len and substrim [] functions as follow ( You want to use stage variables for best performance )

Using your example MilwaukeeWI54621

ZipCode = OCONV(InputField,"MCN") This will extracts all numeric characters 54621
StateCode = (OCONV(InputField,"MCA"))[2] This will extracts last two alpha characters WI
AlphaLen = Len(OCONV(trim(InputField),"MCA")) This get the Len
StateDescription = OCONV(trim(InputField),"MCA")[ 1, (AlphaLen - 2)] This will take the all alpha chararacters and leave out the last two Milwaukee

Quality Stage US area rule set will do the same but you will need to customize the Rule Set because the data does not have a separator between tokens (words), so you will need to adjust the rule a little bit to parse this kind of record for you

ray.wurlod · Post by **ray.wurlod** » Thu Feb 05, 2009 1:03 am

JRodriguez wrote:Quality Stage US area rule set will do the same but you will need to customize the Rule Set because the data does not have a separator between tokens (words), so you will need to adjust the rule a little bit to parse this kind of record for you

Not true. You can define the data to be fixed width and organize that it is.

hobocamp · Post by **hobocamp** » Thu Feb 05, 2009 9:23 am

Ray and JRodriguez -

Thank you both for your suggestions. From my testing, it appears that the OCONV functions will be the way to go for now.

Ray - I have to admit I wasn't even aware of the MatchField function, but I am intrigued by it, and anxious to see how I might use it in future applications.

(I realized that Quality Stage would have been a good solution for this type of issue, but the license for it unfortunately isn't in our budget for this year.)

I'll mark this issue as Resolved.
Thanks again.
Tom

ray.wurlod · Post by **ray.wurlod** » Thu Feb 05, 2009 12:12 pm

One of the upcoming Tips and Tricks papers at DSXchange Learning Center deals with pattern matching (including the MatchField() function) in DataStage.

DSXchange

parsing address informaiton

parsing address informaiton

Re: parsing address informaiton

Re: parsing address informaiton