split city and state data from one filed into two fields
Moderators: chulett, rschirm, roy
split city and state data from one filed into two fields
I want to split city and state in to two fields from one field as in below example:
field: ADDRESS
MAYVILLE WI
SPOKANE WA
FREEBURG IL
ST. LOUIS, MO
MILFORD C.T
FREEBURG, I.L
Results should look like this:
fileds: CITY | STATE
MAYVILLE | WI
SPOKANE | WA
FREEBURG | IL
ST.LOUIS | MO
MILDORD | CT
FREEBURG | IL
thank you
Ashok
field: ADDRESS
MAYVILLE WI
SPOKANE WA
FREEBURG IL
ST. LOUIS, MO
MILFORD C.T
FREEBURG, I.L
Results should look like this:
fileds: CITY | STATE
MAYVILLE | WI
SPOKANE | WA
FREEBURG | IL
ST.LOUIS | MO
MILDORD | CT
FREEBURG | IL
thank you
Ashok
-
- Participant
- Posts: 3593
- Joined: Thu Jan 23, 2003 5:25 pm
- Location: Australia, Melbourne
- Contact:
I noticed in another thread that you have QualityStage installed. Why not run these fields through standard address parsing through a QualityStage plugin? It should be able to seperate the city from the state and deliver them back in seperate fields.
If you were doing it in DataStage you need to make certain assumptions. Are city and state always seperated by a space? Does the state always appear at the end of the string? Is state always a two character code?
In a routine I would split off the last two characters after the last space using the DCOUNT and FIELD commands. I would then remove any non alpha charcters from this string. I would then pass this to a state code lookup, either by hard coding state codes in a routine or doing a transformer lookup against a hash file. Anything that does not produce a valid state code gets rejected.
The remainder of the string becomes the city. Trim off from the end of the city name any non alpha characters (eg. ST.LOUIS, becomes ST.LOUIS).
If you were doing it in DataStage you need to make certain assumptions. Are city and state always seperated by a space? Does the state always appear at the end of the string? Is state always a two character code?
In a routine I would split off the last two characters after the last space using the DCOUNT and FIELD commands. I would then remove any non alpha charcters from this string. I would then pass this to a state code lookup, either by hard coding state codes in a routine or doing a transformer lookup against a hash file. Anything that does not produce a valid state code gets rejected.
The remainder of the string becomes the city. Trim off from the end of the city name any non alpha characters (eg. ST.LOUIS, becomes ST.LOUIS).
Certus Solutions
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
If you really want to do it in DataStage you can, though I'd fully support your using Quality Stage - after all, "bucketing" address components is one of the things it does supremely well. For those who don't have Quality Stage...
A "final delimited field" function is handy thing to have in one's tool kit. Here's a cut down version (with no error reporting).
State could be derived, then, as FinalDelimitedString(Trim(Address)," ")
The remainder of the address (city) could be handled with something similar, to return all but the final delimited field.
The TrimB function is insurance against the Address field containing trailing spaces.
A "final delimited field" function is handy thing to have in one's tool kit. Here's a cut down version (with no error reporting).
Code: Select all
FUNCTION FinalDelimitedField(String, Delimiter)
* Author: Ray Wurlod
If UnAssigned(String) Or IsNull(String) Or UnAssigned(Delimiter) Or IsNull(Delimiter)
Then
Ans = @NULL
End
Else
If Len(Delimiter) = 1
Then
FieldCount = DCount(String, Delimiter)
Ans = Field(String, Delimiter, FieldCount, 1)
End
Else
Ans = @NULL
End
End
RETURN(Ans)
The remainder of the address (city) could be handled with something similar, to return all but the final delimited field.
Code: Select all
Field(TrimB(Address), " ", 1, Count(TrimB(Address), " "))
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 3593
- Joined: Thu Jan 23, 2003 5:25 pm
- Location: Australia, Melbourne
- Contact:
And somewhere in there, either at the start or at the end, remove the comma delimiters from the city name.
Code: Select all
Address = EREPLACE(Address, ",", "")
Certus Solutions
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Ashok
There is a separate forum for Quality Stage and post your questions there for quick reply. Not all knows Quality Stage in this forum to help you.
Thanks
There is a separate forum for Quality Stage and post your questions there for quick reply. Not all knows Quality Stage in this forum to help you.
Thanks
Regards
Siva
Listening to the Learned
"The most precious wealth is the wealth acquired by the ear Indeed, of all wealth that wealth is the crown." - Thirukural By Thiruvalluvar
Siva
Listening to the Learned
"The most precious wealth is the wealth acquired by the ear Indeed, of all wealth that wealth is the crown." - Thirukural By Thiruvalluvar
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact: