Page 1 of 1

How to remove space, coma, period from incpmeing data

Posted: Sun Sep 26, 2004 10:19 am
by ashok
help needed, How to remove characters in data containing like space, underscore, coma, period ect... and collect next letters
example
1) John_Big is in comeing data and I need out put as JohnB
2) Tim,Jr the out put should be TimJr

Posted: Sun Sep 26, 2004 10:51 am
by chulett
Take a look through the Transforms that come with the product. For example, there is one called DIGITS that removes all non-numerics from a string. I'm sure there's some sort of alpha equivalent as well.

Just checked and there is - it's called LETTERS.

Looking at them, you'll see that they simply invoke Oconv with different conversion codes. You could do the same thing yourself, but which is clearer - "Oconv(YourField,"MCA")" or "LETTERS(YourField)"? :wink:

Posted: Sun Sep 26, 2004 3:17 pm
by vmcburney
You should also look at QualityStage that handles free text standardisation better then DataStage. Part of the problem is keeping track of all the rules and combination of names. Are you always getting "junior" as Jr or are you also getting Jnr, Junior, J or Jun.

Re: How to remove space, coma, period from incpmeing data

Posted: Sun Sep 26, 2004 4:46 pm
by kura
[quote="ashok"]help needed, How to remove characters in data containing like space, underscore, coma, period ect... and collect next letters
example
1) John_Big is in comeing data and I need out put as JohnB
2) Tim,Jr the out put should be TimJr[/quote]


Did you try to use change or ereplace function from transformer?

change(<input_column>,"/","") replaces all "/" with empty string.

I hope this hepls you. if want to replce both space and "/" then try this one
change(change(<input_column>,"/","")," ","")

Thanks
Kura

Posted: Sun Sep 26, 2004 5:10 pm
by vmcburney
Removing the non character fields is the easy bit, there are a dozen ways to do it, parsing the text left over is the hard bit. If someone has three names how do you tell which is the first name, middle name and surname?

Mary Ann Flanders :- could be Mary-Ann Flanders or Mary Ann-Flanders or Mary Ann Flanders. No way of telling the difference between first, middle and last name. You need to find out whether these extra "," and "_" characters have business rules behind them to help you parse the text they are delimiting. Eg. does a comma always indicate a name suffix, does an underscore always precede a surname. It is likely that the "-" character indicates two names that are really one so you don't want to remove those from the first name (Mary-AnnF) but you do want to remove them from a capitalised surname (MaryA).

Posted: Mon Sep 27, 2004 5:32 pm
by gh_amitava
Hi,

If you are designing jobs in PX environment, don't use any basic function which are used in server version like DIGIT etc. It will minimize the efficiency of your program. Use Convert function in PX Transformer. The syntax is available in the help.

Regards
Amitava

Posted: Mon Sep 27, 2004 9:41 pm
by ray.wurlod
The original poster specifed server job, and posted in the server-job forum. That answer was, therefore, not entirely helpful.