Dealing with a Complex Mix

Infosphere's Quality Product

Moderators: chulett, rschirm

Post Reply
emeri1md
Participant
Posts: 33
Joined: Tue Jun 17, 2008 10:42 am

Dealing with a Complex Mix

Post by emeri1md »

I'm working with data with this structure: 14X15X3. I need to pull each dimension from the overall measurement and place them into variables. I tried using 'MOVE [1](n) temp', but MOVE does not work well with operands. I also tried copying the data to a user variable first, but that didn't help.

I know I can use copy to get the first and last numbers, but that won't help me with the middle part.

How can I work with the complexities of this data?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Can't you use a pattern like ^ | I = "X" | ^ | I = "X" | ^ ? Based on this pattern, the "middle part" would be token [3].
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
stuartjvnorton
Participant
Posts: 527
Joined: Thu Apr 19, 2007 1:25 am
Location: Melbourne

Post by stuartjvnorton »

ray.wurlod wrote:Can't you use a pattern like ^ | I = "X" | ^ | I = "X" | ^ ? Based on this pattern, the "middle part" would be token [3]. ...
Good idea, Ray.

emeri1md, if this is the only type of data you have in the field, then there is a fairly easy solution.
I'd make a basic ruleset to do this alone (might be handy as part of a product domain ruleset at some point). You don't have to, but putting a letter in the sep or strip list my give you problems with normal text.

You'd have a DCT file with 3 dimensions plus the basic reporting fields, a CLS file with 'X' in it, and a PAT file with 1 rule in it.

You would put the letter 'X' in the sep list (but not the strip list) to split the text up, then you'd get the pattern that Ray talks about.

If you tack it onto a standard ruleset or have other words in the field, you'll then have to fix up the words that you split up because of the x. Yuck.

If this the only data in the field, it might just be easier to use a transform step and look for the X's.
emeri1md
Participant
Posts: 33
Joined: Tue Jun 17, 2008 10:42 am

Post by emeri1md »

Unfortunately, it's part of a completely freeform text field. Not all records have it, and some have a measurement type (KG, MM, etc) attached to the end as well.

Since it's picking it up as a @, I don't think ^ | I = "X" | ^ | I = "X" | ^ will work either.

I do like the transformer idea, though. I've been looking at it as a purely QS/pattern-action problem. However, there can easily be other "X" within the field, so I can't add "X" to the SEPLIST or STRIPLIST. Let me mess around with a transformer and get back to you.
stuartjvnorton
Participant
Posts: 527
Joined: Thu Apr 19, 2007 1:25 am
Location: Melbourne

Post by stuartjvnorton »

Another idea. Off the top of my head so may not work.

You have used COPY to get the first and last parts.
So you'd now have something like X99X (variable length number in the middle obviously), right?

So maybe set a flag to say you're in the middle of stripping it out, then
something like:

** | @ [{}(1:1) = "X" and {}(-1:-1) = "X"] | ** | [DimensionStrip="1"]
COPY [2](2:-2) MyOtherDimension
RETYPE [2] 0
emeri1md
Participant
Posts: 33
Joined: Tue Jun 17, 2008 10:42 am

Post by emeri1md »

[quote="stuartjvnorton"]You have used COPY to get the first and last parts.
So you'd now have something like X99X (variable length number in the middle obviously), right?[/quote]

That [i]would [/i]work, except that COPY doesn't get rid of what it copies; MOVE does that. Unfortunately, MOVE doesn't work with operands (where you have to use a COPY and RETYPE to get the same effect).

So far, I have an imperfect workaround. In a transformer, I have a stag variable that checks to see if I have at least two X's in the field. If so, I use FIELD (using X as the delimiter) to get the values before, in the middle of, and after the X's. Then I reform the field with spaces around the X's, so I can use the pattern-action language to work with it.

It's not perfect, so I can't mark this as resolved (yet); I still get false positives.

Thanks to everyone for their ideas.
stuartjvnorton
Participant
Posts: 527
Joined: Thu Apr 19, 2007 1:25 am
Location: Melbourne

Post by stuartjvnorton »

Ugly (& I mean UGLY :oops:), but works for 1st and last parts up to 3 digits.

; Find it, remove it.
** | @ | **
COPY [2] temp
COPY temp(n) {Dim1}
COPY temp(-n) {Dim3}
COPY "1" DimStripping
RETYPE [2] 0

; Strip the 1st part off
** | [DimStripping="1" & {Dim1} LEN = 1]
COPY temp(2:-1) temp

** | [DimStripping="1" & {Dim1} LEN = 2]
COPY temp(3:-1) temp

** | [DimStripping="1" & {Dim1} LEN = 3]
COPY temp(4:-1) temp

; Strip the 3rd part off
** | [DimStripping="1" & {Dim3} LEN = 1]
COPY temp(1:-2) temp

** | [DimStripping="1" & {Dim3} LEN = 2]
COPY temp(1:-3) temp

** | [DimStripping="1" & {Dim3} LEN = 3]
COPY temp(1:-4) temp

; Get the 2nd part.
** | [DimStripping= "1" & temp(1:1) = "X" & temp(-1:-1) = "X"]
COPY temp(2:-2) {Dim2}
COPY "" temp

May still get false positives.
You could fix that by using PICT to test the actual char pattern (and stripping the parts directly), but rather time consuming.

;1 of these for each combination of digits
** | @ [{}PICT = "npnpn" & {}(2:2) = "X" & {}(4:4) = "X"] | **
COPY [2](1:1) {Dim1}
COPY [2](3:3) {Dim2}
COPY [2](5:5) {Dim3}
RETYPE [2] 0




Another option is to use a cut down ruleset (that does have X in the SEP list) to just grab the dimensions, optionally putting a marker back to preserve context. Everything else gets put into unhandled and that's the input into another more standard ruleset to parse the rest.
Post Reply