Non-greedy pattern matching?

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
surfsup
Participant
Posts: 18
Joined: Thu Apr 23, 2009 8:43 am

Non-greedy pattern matching?

Post by surfsup »

Howdy,

For reasons too convoluted to explain, I need to interpret numbers written in scientfic notation and I need to do that in basic. I'd need to extract the sign, the base, the exponent sign and the exponent.

To somewhat duplicate the functionality of QS, I initially thought of using pattern matching. The main (undocumented) problem is that the pattern matching of QS is greedy and there's no mention of this in the IBM docs.

I know this is a long shot, but is there a magic way of changing the pattern matching in basic from greedy to non greedy?

Although the mask below isn't correct, it illustrates my point better than the correct one: the 0x gobbles up as much as it can to make it usefull.

Code: Select all

value: 1.234e10 mask:1N0x0N1x0N returns 4 fields: 1 | .234e1 || 2 (third field is empty).
I'd expect it to return 5 fields: 1 | . | 234 | e | 12
Edit: changed engine to server
Last edited by surfsup on Thu Jul 19, 2012 7:10 am, edited 1 time in total.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

How about being just slightly more specific with your mask? "0N'.'0N1A0N" in conjunction with the MatchField() function should work with a full (five piece) scientific notation number.

If the number is 1.234e10 then:
MatchField(InLink.TheNumber, "0N'.'0N1A0N",1) returns "1"
MatchField(InLink.TheNumber, "0N'.'0N1A0N",2) returns "."
MatchField(InLink.TheNumber, "0N'.'0N1A0N",3) returns "234"
MatchField(InLink.TheNumber, "0N'.'0N1A0N",4) returns "e"
MatchField(InLink.TheNumber, "0N'.'0N1A0N",5) returns "10"
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
surfsup
Participant
Posts: 18
Joined: Thu Apr 23, 2009 8:43 am

Post by surfsup »

ray.wurlod wrote:How about being just slightly more specific with your mask? "0N'.'0N1A0N" in conjunction with the MatchField() function should work with a full (five piece) scientific notation number.

If the numbe ...
I can't see your full point and I may be missing something, but it doesn't seem as straightforward as writing the one correct greedy pattern.

Some numbers don't come in with a decimal point (i.e 1E+10) and there are 18 variations I can think of:

Code: Select all

(sign){number(s)}((decimal sign){number(s)}){exponent}(sign){number(s)}

Where () is optional, {} is mandatory.
There are 3 possibilities for each sign (+, - and absent) and 2 for the decimal part (decimal sign + number following - present or absent).

The one non-greedy string that would match all would be:

Code: Select all

(0X)(1N0N)(0X)(0N)(1X)(0X)(1N0N)
(sign){number}((decimal sign){number}){exponent}(sign){number}
This is assuming the interpretor would know to stop matching the X token when the first N token is found. In this scenario the MatchToken would work like a charm.

I could write 18 greedy patterns starting from the least to the most inclusive, but I think that would be harder to maintain (and uglier) then disassembling the string in "non-pattern" fashion.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Matches operator and MatchField() function allow for multiple patterns in a value mark delimited list.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply