Page 2 of 2

Posted: Mon May 08, 2006 10:16 am
by jazzer1
Is there some way of extracting the data from the XML file without including the tags ?

Posted: Mon May 08, 2006 12:56 pm
by jgibby
I tested and found out the SERIESTOTEXT function will not do it. I don't know of a way to extract just the data values when you don't know what the xml tags are.

I have a fix but not very pretty

Posted: Tue May 09, 2006 7:08 am
by jazzer1
I have one solution but I had to hardcode some stuff. This will work for now.....

=SUBSTITUTE(LEAVEPRINT(record01),"<data1>","","</data1>","|")) etc.

This returns a result of 111|222|333 etc.

However, I had to hardcode the xml tags to be replaced.
There must be a way to do a wildcard search and replace.
I'll figure the out next.

Thanks to all who contributed.

Posted: Tue May 09, 2006 7:29 am
by jgibby
Maybe there is a better way to approach this. If it is possible that you can change the output type tree, then there is a very simple solution.

You are trying to take an xml series object and extract just the data elements into a single object expression on the output side. However, if you modify the designated output field to make it a series object as well, then the problem is solved very simply. Make the target field on the output type tree a series object infix delimited by the pipe character.

New Output Series Object Field Formula:

Code: Select all

=FieldData:DummyRec:Input01
John

Posted: Tue May 09, 2006 8:49 am
by jazzer1
I ended up using this:

=F_String(SUBSTITUTE(LEAVEPRINT(Record1),"<Data1>","","</Data1>","|")) etc...

The only problem being I have to hardcode the values of the tags.
The next step is to come up with a way to inspect the string in a wildcard search.

Seen the new post.

Thanks very much to all who helped.

Posted: Tue May 09, 2006 9:01 am
by jazzer1
John: I apologize for these basic questions, but how do I define a series object ? I have a group defined and a record(0:s) under it.

Posted: Tue May 09, 2006 2:02 pm
by jgibby
I take it you have fields or columns defined under the Record series object. Something like this

Code: Select all

File
	Record (0:s)
		Field1 (1:1)
		Field2 (1:1)
		Field3 (1:1)
		Field4 (1:1)
		FieldN (1:1)
What you are going to want to do is create another group object that will hold the field in question. Set the group's delimiter and put the field in the group's component window and set the range. Now put the new group in the record object's component window in place of the single field.
It would look something like this:

Code: Select all

File
	Record (0:s)
		Field1 (1:1)
		Field2 (1:1)
		Field3 (1:1)
		Field4GRP (1:1)
			Field4 (0:s)
		FieldN (1:1)
I'm running on 7.5.1. If you want to send me the Type Tree, I'll mock it up for you. I'll PM you my email address.

John

Posted: Wed May 10, 2006 10:01 am
by jazzer1
I'm close....(I think)
Input file looks like this:

<tag1><tag2>dataxxxxxxxxxxx</tag2><tag3>dataxxxxx</tag3>>tag4>dataxxxxxxxxxxxzzz</tag4></tag1>

The input is one continuous stream.

I have the input file defined like this:

Extracted_XML Group
Extract_Rec(0:s)
start_tag(0:s) initiator = <, terminator = >
data_string(0:1)
end_tag(0:s) initiator = </, terminator = >

I'm trying to write out the data_string stuff separated by a "|"

So far, the output looks like this:

dataxxxxxxxxxxx</tag2><tag3>dataxxxxx</tag3>>tag4>dataxxxxxxxxxxxzzz</tag4></tag1>

The map drops the first two tags (which is good) but writes the rest of the file as is. Am I close ?

Posted: Wed May 10, 2006 10:25 am
by jgibby
For some reason, you're type tree looks a little strange to me, but I have an idea. Try this:

Code: Select all

=SERIESTOTEXT(
	EXTRACT(
		data_string:Extract_Rec:Extracted_XML
		,data_string:Extract_Rec:Extracted_XML != ""
	) + "|"
)
Looks like it might be worth a shot. Let us know.

John

Posted: Wed May 10, 2006 10:44 am
by jazzer1
Same result. I would have thought something would change but it didn't.

Posted: Wed May 10, 2006 10:52 am
by jazzer1
I looked at the log and the map recognizes the first two tags but then thinks the rest of the stream is data. It's not recognizing the </tag3> after the first chunk of data.