Can I split a token in 2 new tokens?
Can I split a token in 2 new tokens?
Hi,
I'm standardizing italian addresses (but I'll try to make an english address' example).
I've got a certain number of addresses like this:
SUNSET BOULEVARD12
Does anybody know a smart way to get:
SUNSET BOULEVARD 12
???
I already know a "smart but not optimal" way: I can create a special rule set where the SEPLIST is:
SEPLIST "1234567890"
and the only rules (supposing that the numbers take 5 digits max) are:
*^|^|^|^|^|$
COPY [1] TEMP
CONCAT [2] TEMP
CONCAT [3] TEMP
CONCAT [4] TEMP
CONCAT [5] TEMP
RETYPE [1] + TEMP TEMP
RETYPE [2] 0
RETYPE [3] 0
RETYPE [4] 0
RETYPE [5] 0
*^|^|^|^|$
.............
*^|^|^|$
............
*^|^|$
............
**
COPY_S [1] {UD}
Applying this rule set I'd reach my goal; the UD field would become the new input for my original rule set.
Many thanks
I'm standardizing italian addresses (but I'll try to make an english address' example).
I've got a certain number of addresses like this:
SUNSET BOULEVARD12
Does anybody know a smart way to get:
SUNSET BOULEVARD 12
???
I already know a "smart but not optimal" way: I can create a special rule set where the SEPLIST is:
SEPLIST "1234567890"
and the only rules (supposing that the numbers take 5 digits max) are:
*^|^|^|^|^|$
COPY [1] TEMP
CONCAT [2] TEMP
CONCAT [3] TEMP
CONCAT [4] TEMP
CONCAT [5] TEMP
RETYPE [1] + TEMP TEMP
RETYPE [2] 0
RETYPE [3] 0
RETYPE [4] 0
RETYPE [5] 0
*^|^|^|^|$
.............
*^|^|^|$
............
*^|^|$
............
**
COPY_S [1] {UD}
Applying this rule set I'd reach my goal; the UD field would become the new input for my original rule set.
Many thanks
-
- Premium Member
- Posts: 301
- Joined: Thu Jul 14, 2005 10:27 am
- Location: Melbourne, Australia
- Contact:
Flavour,
If you do an investigate on your example input you'll see it's classified as "<" (Leading Alphabetic). You can create a simple pattern to match your leading alphabetic token and then use the "-n" (trailing numeric characters) and "c" (leading alphabetic characters) options with COPY to split your token. E.g. For SUNSET BOULEVARD12 you'd use ...
See the "Copying Leading and Trailing Characters" section in the QualityStage documentation.
HTH,
J.
If you do an investigate on your example input you'll see it's classified as "<" (Leading Alphabetic). You can create a simple pattern to match your leading alphabetic token and then use the "-n" (trailing numeric characters) and "c" (leading alphabetic characters) options with COPY to split your token. E.g. For SUNSET BOULEVARD12 you'd use ...
Code: Select all
? | <
COPY [1] {WF} ;Whatever field
COPY [2](c) {ST} ;StreetType Field
COPY [2](-n) {HN} ;HouseNumber field
etc.
HTH,
J.
<b>John McKeever</b>
Data Migrators
<b><a href="https://www.mettleci.com">MettleCI</a> - DevOps for DataStage</b>
<a href="http://www.datamigrators.com/"><img src="https://www.datamigrators.com/assets/im ... l.png"></a>
Data Migrators
<b><a href="https://www.mettleci.com">MettleCI</a> - DevOps for DataStage</b>
<a href="http://www.datamigrators.com/"><img src="https://www.datamigrators.com/assets/im ... l.png"></a>
Hi jhmckeever,
you're right.
My example was too simple: I forgot to say that I absolutely need the creation of the new token because subsequently there are a lot of rules (created months ago and I don't want to modify them) taking care of other things I didn't mention.
But your solution is really better than my "smart but not optimal".
Thank you!
you're right.
My example was too simple: I forgot to say that I absolutely need the creation of the new token because subsequently there are a lot of rules (created months ago and I don't want to modify them) taking care of other things I didn't mention.
But your solution is really better than my "smart but not optimal".
Thank you!
-
- Premium Member
- Posts: 301
- Joined: Thu Jul 14, 2005 10:27 am
- Location: Melbourne, Australia
- Contact:
Hi Flavour,
Hmmm - interesting one. One approach could be:
This is just off the top of my head so apologies in advance if this doesn't work!
J.
Hmmm - interesting one. One approach could be:
Code: Select all
? | <
COPY [2](c) temp1 ;Leading characters
COPY [2](-n) temp2 ;Trailing numerics
CONCAT " " temp1
CONCAT temp2 temp1 ;Create temp1 " " temp2
RETYPE [2] ? temp1
PATTERN {IP}
J.
<b>John McKeever</b>
Data Migrators
<b><a href="https://www.mettleci.com">MettleCI</a> - DevOps for DataStage</b>
<a href="http://www.datamigrators.com/"><img src="https://www.datamigrators.com/assets/im ... l.png"></a>
Data Migrators
<b><a href="https://www.mettleci.com">MettleCI</a> - DevOps for DataStage</b>
<a href="http://www.datamigrators.com/"><img src="https://www.datamigrators.com/assets/im ... l.png"></a>
Hi jhmckeever,
I'm sorry but I think that PATTERN command can't create a never-born token.
My input is:
VIALE FERRARI12 (in UK I think it would be: 12, BOULEVARD FERRARI)
Your rule (adapted to my inputs) is:
T | <
COPY [2](c) temp1 ;Leading characters
COPY [2](-n) temp2 ;Trailing numerics
CONCAT " " temp1
CONCAT temp2 temp1 ;Create temp1 " " temp2
RETYPE [2] ? temp1
PATTERN {IP}
Immediately after your rule I have this "STOP" rule:
**
COPY_S [1] {UD}
PATTERN {UP}
EXIT
Well, I can see what follows:
IP: T+ (It was T< before applying your rule)
UP: T+
UD: VIALE FERRARI 12
However, many thanks for your time!
I'm sorry but I think that PATTERN command can't create a never-born token.
My input is:
VIALE FERRARI12 (in UK I think it would be: 12, BOULEVARD FERRARI)
Your rule (adapted to my inputs) is:
T | <
COPY [2](c) temp1 ;Leading characters
COPY [2](-n) temp2 ;Trailing numerics
CONCAT " " temp1
CONCAT temp2 temp1 ;Create temp1 " " temp2
RETYPE [2] ? temp1
PATTERN {IP}
Immediately after your rule I have this "STOP" rule:
**
COPY_S [1] {UD}
PATTERN {UP}
EXIT
Well, I can see what follows:
IP: T+ (It was T< before applying your rule)
UP: T+
UD: VIALE FERRARI 12
However, many thanks for your time!
-
- Premium Member
- Posts: 301
- Joined: Thu Jul 14, 2005 10:27 am
- Location: Melbourne, Australia
- Contact:
Hi flavour,
The simplest solution is to take what I described in my previous email and use it as a pre-processor. Just pass your data into the pattern to split the '<' tokens where appropriate, then take the resulting file and pass it into your original pattern file, where it will be re-tokenised into multiple tokens before being processed.
Other alternative solutions are ...
1. Check out the CONVERT_R (Convert with retokenization) operator. You may have to create a clunky solution to use this, but it would permit the creation of a new token on-the-fly.
2. Alternatively, if you're hosting your QualityStage job in a DataStage job you could just identify and split the offending token in a DataStage transformer before submitting it to QualityStage.
HTH,
J.
The simplest solution is to take what I described in my previous email and use it as a pre-processor. Just pass your data into the pattern to split the '<' tokens where appropriate, then take the resulting file and pass it into your original pattern file, where it will be re-tokenised into multiple tokens before being processed.
Other alternative solutions are ...
1. Check out the CONVERT_R (Convert with retokenization) operator. You may have to create a clunky solution to use this, but it would permit the creation of a new token on-the-fly.
2. Alternatively, if you're hosting your QualityStage job in a DataStage job you could just identify and split the offending token in a DataStage transformer before submitting it to QualityStage.
HTH,
J.
<b>John McKeever</b>
Data Migrators
<b><a href="https://www.mettleci.com">MettleCI</a> - DevOps for DataStage</b>
<a href="http://www.datamigrators.com/"><img src="https://www.datamigrators.com/assets/im ... l.png"></a>
Data Migrators
<b><a href="https://www.mettleci.com">MettleCI</a> - DevOps for DataStage</b>
<a href="http://www.datamigrators.com/"><img src="https://www.datamigrators.com/assets/im ... l.png"></a>
-
- Participant
- Posts: 1
- Joined: Mon Feb 05, 2007 4:51 am
Re: Can I split a token in 2 new tokens?
Hi All!
I was really looking for a simpler solution to this, but here is my "acceptably less complex![Confused :?](./images/smilies/icon_confused.gif)
solution using convert_s for the token splitting:
+|>
copy [2] temp ;save for catch up below
copy "12" splitvalue ;help value for split
retype [2] S splitvalue ;precondition for split
+|S
convert_s [2] @splitvalues.tbl TKN B > ; table with one line "2 2"
+|>|B ; the new born token (B)
retype [2] > temp ;put original value back in
+|>|B
copy [2](c) chr_part ;get streetname suffix
copy [2](-n) num_part ;get house number
retype [2] + chr_part ; put in wanted value
retype [3] ^ num_part ; put in wanted value
RESULT:
+ + ^
SUNSET BOULEVARD 12
I was really looking for a simpler solution to this, but here is my "acceptably less complex
![Confused :?](./images/smilies/icon_confused.gif)
solution using convert_s for the token splitting:
+|>
copy [2] temp ;save for catch up below
copy "12" splitvalue ;help value for split
retype [2] S splitvalue ;precondition for split
+|S
convert_s [2] @splitvalues.tbl TKN B > ; table with one line "2 2"
+|>|B ; the new born token (B)
retype [2] > temp ;put original value back in
+|>|B
copy [2](c) chr_part ;get streetname suffix
copy [2](-n) num_part ;get house number
retype [2] + chr_part ; put in wanted value
retype [3] ^ num_part ; put in wanted value
RESULT:
+ + ^
SUNSET BOULEVARD 12
flavour wrote:Hi,
I'm standardizing italian addresses (but I'll try to make an english address' example).
I've got a certain number of addresses like this:
SUNSET BOULEVARD12
Does anybody know a smart way to get:
SUNSET BOULEVARD 12
???
I already know a "smart but not optimal" way: I can create a special rule set where the SEPLIST is:
SEPLIST "1234567890"
and the only rules (supposing that the numbers take 5 digits max) are:
*^|^|^|^|^|$
COPY [1] TEMP
CONCAT [2] TEMP
CONCAT [3] TEMP
CONCAT [4] TEMP
CONCAT [5] TEMP
RETYPE [1] + TEMP TEMP
RETYPE [2] 0
RETYPE [3] 0
RETYPE [4] 0
RETYPE [5] 0
*^|^|^|^|$
.............
*^|^|^|$
............
*^|^|$
............
**
COPY_S [1] {UD}
Applying this rule set I'd reach my goal; the UD field would become the new input for my original rule set.
Many thanks