help needed, How to remove characters in data containing like space, underscore, coma, period ect... and collect next letters
example
1) John_Big is in comeing data and I need out put as JohnB
2) Tim,Jr the out put should be TimJr
How to remove space, coma, period from incpmeing data
Moderators: chulett, rschirm, roy
Take a look through the Transforms that come with the product. For example, there is one called DIGITS that removes all non-numerics from a string. I'm sure there's some sort of alpha equivalent as well.
Just checked and there is - it's called LETTERS.
Looking at them, you'll see that they simply invoke Oconv with different conversion codes. You could do the same thing yourself, but which is clearer - "Oconv(YourField,"MCA")" or "LETTERS(YourField)"?![Wink :wink:](./images/smilies/icon_wink.gif)
Just checked and there is - it's called LETTERS.
Looking at them, you'll see that they simply invoke Oconv with different conversion codes. You could do the same thing yourself, but which is clearer - "Oconv(YourField,"MCA")" or "LETTERS(YourField)"?
![Wink :wink:](./images/smilies/icon_wink.gif)
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Participant
- Posts: 3593
- Joined: Thu Jan 23, 2003 5:25 pm
- Location: Australia, Melbourne
- Contact:
You should also look at QualityStage that handles free text standardisation better then DataStage. Part of the problem is keeping track of all the rules and combination of names. Are you always getting "junior" as Jr or are you also getting Jnr, Junior, J or Jun.
Certus Solutions
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Re: How to remove space, coma, period from incpmeing data
[quote="ashok"]help needed, How to remove characters in data containing like space, underscore, coma, period ect... and collect next letters
example
1) John_Big is in comeing data and I need out put as JohnB
2) Tim,Jr the out put should be TimJr[/quote]
Did you try to use change or ereplace function from transformer?
change(<input_column>,"/","") replaces all "/" with empty string.
I hope this hepls you. if want to replce both space and "/" then try this one
change(change(<input_column>,"/","")," ","")
Thanks
Kura
example
1) John_Big is in comeing data and I need out put as JohnB
2) Tim,Jr the out put should be TimJr[/quote]
Did you try to use change or ereplace function from transformer?
change(<input_column>,"/","") replaces all "/" with empty string.
I hope this hepls you. if want to replce both space and "/" then try this one
change(change(<input_column>,"/","")," ","")
Thanks
Kura
-
- Participant
- Posts: 3593
- Joined: Thu Jan 23, 2003 5:25 pm
- Location: Australia, Melbourne
- Contact:
Removing the non character fields is the easy bit, there are a dozen ways to do it, parsing the text left over is the hard bit. If someone has three names how do you tell which is the first name, middle name and surname?
Mary Ann Flanders :- could be Mary-Ann Flanders or Mary Ann-Flanders or Mary Ann Flanders. No way of telling the difference between first, middle and last name. You need to find out whether these extra "," and "_" characters have business rules behind them to help you parse the text they are delimiting. Eg. does a comma always indicate a name suffix, does an underscore always precede a surname. It is likely that the "-" character indicates two names that are really one so you don't want to remove those from the first name (Mary-AnnF) but you do want to remove them from a capitalised surname (MaryA).
Mary Ann Flanders :- could be Mary-Ann Flanders or Mary Ann-Flanders or Mary Ann Flanders. No way of telling the difference between first, middle and last name. You need to find out whether these extra "," and "_" characters have business rules behind them to help you parse the text they are delimiting. Eg. does a comma always indicate a name suffix, does an underscore always precede a surname. It is likely that the "-" character indicates two names that are really one so you don't want to remove those from the first name (Mary-AnnF) but you do want to remove them from a capitalised surname (MaryA).
Certus Solutions
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
-
- Participant
- Posts: 75
- Joined: Tue May 13, 2003 4:14 am
- Location: California
- Contact:
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact: