Hi ,
I have to clease and match address records, one source has ST and ST. and other has STREET , many more like this(like AVE. CIRC.). i have to write routine or or unix script to cleanse this data and spell out so that I can match 2 records coming from different sources ,
i have simple routine :
LINE1_TEXT =Arg1
if (LINE1_TEXT='RD' or LINE1_TEXT='RD.') then Ans='ROAD'
else if LINE1_TEXT='ST' or LINE1_TEXT='ST.' then Ans='STREET'
else Ans=LINE1_TEXT
But when I am passing line 1 of address as an argument, it is not taking the whole string , as it just identifies 'ST' ot 'ST.' as argument .
Is there any function which will take whole line1 column as argument and changes all the words like ST. or LN or N. to STREET or LANE or NORTH,
I looked into ereplace or change function but not sure how to use it in routine ,
Please suggest .
Thanks .
address cleansing
Moderators: chulett, rschirm, roy
Re: address cleansing
You can't directly apply an Ereplace or similar functions on the entire Address1 string because it would change "123 STRAIT ST." to "123 STREETRAIT STREET".
This kind of work really requires specialized tools like QualityStage. However, for simplistic solutions, analyze your data. If variations for STREET, like ST or ST. or STR, and AVE or AVE. for AVENUE, etc only appear at the end of the first line of address, then you could use the following code -
This kind of work really requires specialized tools like QualityStage. However, for simplistic solutions, analyze your data. If variations for STREET, like ST or ST. or STR, and AVE or AVE. for AVENUE, etc only appear at the end of the first line of address, then you could use the following code -
Code: Select all
vAddress1 = Trim(UpCase(InputLink.Address1))
*Get the number spaces in Address1
vSpaceCnt = Count(vAddress1)
*Process only those addresses with more than 1 words, so BRAVE does not transform to "BRAVENUE"
If vSpaceCnt > 0 Then
*Check Last word
vLastWord = Field(vAddress1, " ", vSpaceCnt + 1)
If vLastWord = "ST" or vLastWord = "ST." or vLastWord = "STR" or vLastWord = "STR." Then "STREET"
If vLastWord = "AVE" or vLastWord = "AVE." Then "AVENUE"
*You can perform other replacements here
End
gateleys
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Use QualityStage - this is precisely the kind of thing (OK, one of the things) it's designed to do. It even correctly figures out that 125 ST STEVEN ST should be 125 SAINT STEVEN STREET.
The technique is called standardization. A QualityStage standardization task can be included in a DataStage job flow.
The technique is called standardization. A QualityStage standardization task can be included in a DataStage job flow.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
The amount of time you need to account for all of the variations (ST, St, st, Street, STREET, et cetera) is better done by using QualityStage.
In fact, as part of the 8.x suite, you will have QualityStage within the whole framework, making it so valuable and easier to use.
The alternative solution would require custom coding that would be expensive if you hire a consultant to do so.
In fact, as part of the 8.x suite, you will have QualityStage within the whole framework, making it so valuable and easier to use.
The alternative solution would require custom coding that would be expensive if you hire a consultant to do so.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
A couple of small but necessary corrections to Teej's post.
Even in version 8.x QualityStage remains separately licensed. That is, you can run DataStage without QualityStage - the data quality stage types remain disabled.
Second, if you are a server-only shop, you do NOT get all the new goodies. You still have to rely on the QualityStage plug-in for server jobs to invoke separately designed version 7.5 style QualityStage tasks.
The new goodies are available only in the parallel execution environment.
(I felt that was a pertinent comment since we're in the Server forum here.)
Even in version 8.x QualityStage remains separately licensed. That is, you can run DataStage without QualityStage - the data quality stage types remain disabled.
Second, if you are a server-only shop, you do NOT get all the new goodies. You still have to rely on the QualityStage plug-in for server jobs to invoke separately designed version 7.5 style QualityStage tasks.
The new goodies are available only in the parallel execution environment.
(I felt that was a pertinent comment since we're in the Server forum here.)
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
That's probably because your site does not have QualityStage and is not willing to make the investment.knowledge wrote:My manager doesn't want to use quality stage (dont know why)
As I and others have pointed out, this is the domain of tools like QualityStage. However, if your address parsing and correction requirements are not very exhaustive, then you can use DataStage. Did you try to implement the code that I posted above?knowledge wrote: I wanted to do it in datastage but it seems it is very difficult to do it in datatstage.
gateleys
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Who's going to maintain this code into the future as the requirements increase - particularly after you've moved on? Can you guarantee to meet key performance indicators (whatever "performance" is) using a hand-coded approach? What testing regime exists? Do you know what all the possible values to be cleansed are? What would you do with "CR"?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.