I've searched a lot for this, and haven't found anything, though I would imagine it's been a common question so here goes -
I have some trimmed address fields containing city, state , zip info (exp: MilwaukeeWI54621), where I need to parse the values out into separate fields. After working various functions (Index, Len, etc.) I haven't come up with a good way to do it. Could anyone kindly offer any tips or suggestions?
Thanks in advance.
Tom
parsing address informaiton
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
QualityStage has this functionality straight out of the box, and can handle misspelled entries as well. You would simply insert a standardization stage that uses the USAREA rule set into your job.
If you don't have QualityStage licensed then the MatchField() function may stand you in good stead in server jobs. For example for the data you have provided:
If you don't have QualityStage licensed then the MatchField() function may stand you in good stead in server jobs. For example for the data you have provided:
Code: Select all
City MatchField(InLink.Address, "0X2A5N", 1)
State MatchField(InLink.Address, "0X2A5N", 2)
Zip MatchField(InLink.Address, "0X2A5N", 3)
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Premium Member
- Posts: 425
- Joined: Sat Nov 19, 2005 9:26 am
- Location: New York City
- Contact:
Re: parsing address informaiton
Tom,
If all your records follows same standard my suggestion is to parse them using a combination of Oconv, Len and substrim [] functions as follow ( You want to use stage variables for best performance )
Using your example MilwaukeeWI54621
ZipCode = OCONV(InputField,"MCN") This will extracts all numeric characters 54621
StateCode = (OCONV(InputField,"MCA"))[2] This will extracts last two alpha characters WI
AlphaLen = Len(OCONV(trim(InputField),"MCA")) This get the Len
StateDescription = OCONV(trim(InputField),"MCA")[ 1, (AlphaLen - 2)] This will take the all alpha chararacters and leave out the last two Milwaukee
Quality Stage US area rule set will do the same but you will need to customize the Rule Set because the data does not have a separator between tokens (words), so you will need to adjust the rule a little bit to parse this kind of record for you
If all your records follows same standard my suggestion is to parse them using a combination of Oconv, Len and substrim [] functions as follow ( You want to use stage variables for best performance )
Using your example MilwaukeeWI54621
ZipCode = OCONV(InputField,"MCN") This will extracts all numeric characters 54621
StateCode = (OCONV(InputField,"MCA"))[2] This will extracts last two alpha characters WI
AlphaLen = Len(OCONV(trim(InputField),"MCA")) This get the Len
StateDescription = OCONV(trim(InputField),"MCA")[ 1, (AlphaLen - 2)] This will take the all alpha chararacters and leave out the last two Milwaukee
Quality Stage US area rule set will do the same but you will need to customize the Rule Set because the data does not have a separator between tokens (words), so you will need to adjust the rule a little bit to parse this kind of record for you
Julio Rodriguez
ETL Developer by choice
"Sure we have lots of reasons for being rude - But no excuses
ETL Developer by choice
"Sure we have lots of reasons for being rude - But no excuses
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Re: parsing address informaiton
Not true. You can define the data to be fixed width and organize that it is.JRodriguez wrote:Quality Stage US area rule set will do the same but you will need to customize the Rule Set because the data does not have a separator between tokens (words), so you will need to adjust the rule a little bit to parse this kind of record for you
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Ray and JRodriguez -
Thank you both for your suggestions. From my testing, it appears that the OCONV functions will be the way to go for now.
Ray - I have to admit I wasn't even aware of the MatchField function, but I am intrigued by it, and anxious to see how I might use it in future applications.
(I realized that Quality Stage would have been a good solution for this type of issue, but the license for it unfortunately isn't in our budget for this year.)
I'll mark this issue as Resolved.
Thanks again.
Tom
Thank you both for your suggestions. From my testing, it appears that the OCONV functions will be the way to go for now.
Ray - I have to admit I wasn't even aware of the MatchField function, but I am intrigued by it, and anxious to see how I might use it in future applications.
(I realized that Quality Stage would have been a good solution for this type of issue, but the license for it unfortunately isn't in our budget for this year.)
I'll mark this issue as Resolved.
Thanks again.
Tom
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
One of the upcoming Tips and Tricks papers at DSXchange Learning Center deals with pattern matching (including the MatchField() function) in DataStage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.