Replace function (or StringDecode routine)

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
newtods
Participant
Posts: 7
Joined: Thu Jan 02, 2003 12:55 pm

Replace function (or StringDecode routine)

Post by newtods »

In order to standardize the customer names and addresses, I like to create a function (or use existing ones) that replace occurrences of words. For example:
Customer Name: "XYZ COMPANY" change to: "XYZ CO"
Address: "201 RAMPART PLACE" change to "201 RAMPART PL";
or: "100 WIDE ROAD, HIGHWAY 11" change to: "100 WIDE RD, HWY 11".
I believe, I need a table for all the replacements (COMPANY=CO, PLACE=PL, ROAD=RD), and like to create a function that replaces all occurrences of "from" to "to". The word might be at the start, middle or end of the field, but always is a token (and never a substring of a string).
I've tried working with StringDecode routine in DStransform but cannot figure out how to use it in the Designer. The routine works the way so an array has to be pre-loaded into memory first and then this array can be used for translation.
Any advise is greatly appreciated.
Thank you.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Well, can't really help with your array issue but did want to point out a more 'brute force' method. I had to do something similar with strings we were tokenizing and we wanted to standardize certain words before we broke the string up. I ended up with a routine I can call in a job that 'standardised' the data before extracting and classifying the resulting tokens. I used a long series of 'ereplace' calls, something like:

NewField = Ereplace(NewField," NORTH EAST "," NE ")
NewField = Ereplace(NewField," COMPANY "," CO ")
NewField = Ereplace(NewField," HIGHWAY "," HWY ")

Etc. I was able to remove all punctuation, so didn't have to worry about commas like you may. I also put an extra space at the front and end of the string (removed later) so that my replaces would catch the first and last token - I didn't want to do substring replacement either and putting spaces around them worked well for me. It's not as elegant as building table-driven arrays, but may be something to consider if you can't get that working.

-craig
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

No matter what you do you're going to have to use some form of brute force method, either multiple Ereplace() functions, as Craig suggests, or code that iterates through your "array loaded from memory".
How were you planning to do the "array in memory" (probably two arrays, one for the "from" tokens, the other for the "to" tokens)?
Possibly a useful approach is to load these into variables declared to be COMMON (so that they are loaded once and persist for the duration of the active stage's execution) in your own transformation function (Routine). Such variables are automatically initialized to zero, so there is an easy test to see whether they need to be loaded, which could occur within your function.
WoMaWil
Participant
Posts: 482
Joined: Thu Mar 13, 2003 7:17 am
Location: Amsterdam

Post by WoMaWil »

When adresses are concerned best is to use an adress check tool than coding something yourself. Every national or international Postal Service can name you software or services.

This method has two advantages:

(1) When you use this adresses for realy sending out letter and parcels or using it with GPS-Routing-Softwares in cars you get the real address, which fits to any system and is read by the postal sort machines best, because it is 100% correct and without guessing.

(2) If you match your addresses one day with those of others you buy or get you can easily crosscheck those you have already.

(3) Within this services you also get checked if the person still lives at this place

and so on.

Never do something yourself with much effort, where good software is already on the market.

Wolfgang
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Whole heartedly agree and you are preaching to the choir, so to speak. Heck, INTEGRITY XE has all this and much more built into it. HOWEVER, your suggestion has one big disadvantage as well - cost. Many shops nowadays are extremely budget constrained, and as much as I'd *love* to buy software that will solve my problems, it can be out of the question. So... you end up putting something together yourself.

-craig
newtods
Participant
Posts: 7
Joined: Thu Jan 02, 2003 12:55 pm

Post by newtods »

Thank you all for help and suggestions.
I agree with Craig, Buying another software might save some development time but will cost some money which my company is not ready to spare at this time.
I did find the way of using StringDecode function. It was our error thinking that we need to load arguments into memory first. Once the function is part of a job, it works the way it should.
Now my question is how do I iterate through the string. Inside my Transformer I have a string coming from one stage "123 Basic Avenue". I want to have this string converted into "123 Basic Ave" and written into another stage. My derivation for "Avenue" looks like this: If StringDecode(DSLink5.ADDRESS1,"STREET=St.|Avenue=Ave") ="" Then DSLink5.ADDRESS1 Else StringDecode(DSLink5.ADDRESS1, "STREET=St.|Avenue=Ave" ).
Could you help me with building the whole function over my incoming string?
No matter what you do you're going to have to use some form of brute force method, either multiple Ereplace() functions, as Craig suggests, or code that iterates through your "array loaded from memory".
How were you planning to do the "array in memory" (probably two arrays, one for the "from" tokens, the other for the "to" tokens)?
Possibly a useful approach is to load these into variables declared to be COMMON (so that they are loaded once and persist for the duration of the active stage's execution) in your own transformation function (Routine). Such variables are automatically initialized to zero, so there is an easy test to see whether they need to be loaded, which could occur within your function.
Post Reply