parsing address informaiton

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
hobocamp
Premium Member
Premium Member
Posts: 98
Joined: Thu Aug 31, 2006 10:04 am

parsing address informaiton

Post by hobocamp »

I've searched a lot for this, and haven't found anything, though I would imagine it's been a common question so here goes -

I have some trimmed address fields containing city, state , zip info (exp: MilwaukeeWI54621), where I need to parse the values out into separate fields. After working various functions (Index, Len, etc.) I haven't come up with a good way to do it. Could anyone kindly offer any tips or suggestions?

Thanks in advance.

Tom
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

QualityStage has this functionality straight out of the box, and can handle misspelled entries as well. You would simply insert a standardization stage that uses the USAREA rule set into your job.

If you don't have QualityStage licensed then the MatchField() function may stand you in good stead in server jobs. For example for the data you have provided:

Code: Select all

City    MatchField(InLink.Address, "0X2A5N", 1)
State   MatchField(InLink.Address, "0X2A5N", 2)
Zip     MatchField(InLink.Address, "0X2A5N", 3)
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
JRodriguez
Premium Member
Premium Member
Posts: 425
Joined: Sat Nov 19, 2005 9:26 am
Location: New York City
Contact:

Re: parsing address informaiton

Post by JRodriguez »

Tom,

If all your records follows same standard my suggestion is to parse them using a combination of Oconv, Len and substrim [] functions as follow ( You want to use stage variables for best performance )


Using your example MilwaukeeWI54621

ZipCode = OCONV(InputField,"MCN") This will extracts all numeric characters 54621
StateCode = (OCONV(InputField,"MCA"))[2] This will extracts last two alpha characters WI
AlphaLen = Len(OCONV(trim(InputField),"MCA")) This get the Len
StateDescription = OCONV(trim(InputField),"MCA")[ 1, (AlphaLen - 2)] This will take the all alpha chararacters and leave out the last two Milwaukee

Quality Stage US area rule set will do the same but you will need to customize the Rule Set because the data does not have a separator between tokens (words), so you will need to adjust the rule a little bit to parse this kind of record for you
Julio Rodriguez
ETL Developer by choice

"Sure we have lots of reasons for being rude - But no excuses
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Re: parsing address informaiton

Post by ray.wurlod »

JRodriguez wrote:Quality Stage US area rule set will do the same but you will need to customize the Rule Set because the data does not have a separator between tokens (words), so you will need to adjust the rule a little bit to parse this kind of record for you
Not true. You can define the data to be fixed width and organize that it is.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
hobocamp
Premium Member
Premium Member
Posts: 98
Joined: Thu Aug 31, 2006 10:04 am

Post by hobocamp »

Ray and JRodriguez -

Thank you both for your suggestions. From my testing, it appears that the OCONV functions will be the way to go for now.

Ray - I have to admit I wasn't even aware of the MatchField function, but I am intrigued by it, and anxious to see how I might use it in future applications.

(I realized that Quality Stage would have been a good solution for this type of issue, but the license for it unfortunately isn't in our budget for this year.)

I'll mark this issue as Resolved.
Thanks again.
Tom
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

One of the upcoming Tips and Tricks papers at DSXchange Learning Center deals with pattern matching (including the MatchField() function) in DataStage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply