XML2 Reader and ">" Character
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 3593
- Joined: Thu Jan 23, 2003 5:25 pm
- Location: Australia, Melbourne
- Contact:
It is a confusing definition, I read it as either being replaced by > or escaped by > and that a on its own > is invalid.
The only thing I can suggest is that you either process this file as a sequential file, parsing the XML into text strings, or pre-process it first where you find and replace embedded > characters with >. You could use a routine that reads the sequential file and replaces all instances of > that occur between <answer> </answer> brackets.
Or better yet change the application that generates the XML.
The only thing I can suggest is that you either process this file as a sequential file, parsing the XML into text strings, or pre-process it first where you find and replace embedded > characters with >. You could use a routine that reads the sequential file and replaces all instances of > that occur between <answer> </answer> brackets.
Or better yet change the application that generates the XML.
Certus Solutions
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
-
- Participant
- Posts: 7
- Joined: Thu Oct 23, 2003 10:00 am
Unfortunatly this is a data file supplied to us, so we are at the mercy of the quality of data sent to us here. I'd thought about a preprocess routine, and we may eventually have to do this is we don't get an update from the supplier, however I was hoping there was some options in the XML2 reader stage that may be able to circumvent this, however I guess not.
Thanks,
Dave
Thanks,
Dave
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
... and, on that basis, you go back to the providers of the XML file and complain bitterly to them that what they're sending to you is not well-formed XML and therefore impossible to process, and require that they do send you well-formed XML.
It why we have standards!
It why we have standards!
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 7
- Joined: Thu Oct 23, 2003 10:00 am
The issue being that some newer parsers don't complain about ">" being inside a tag as they can quite happily identify that the first ">" after "<" must have been the close of a tag and any others are just part of the data. "<" must always be written as ">" within data since any reader would obviously interpret that as an open tag.wdudek wrote:The > character is definately not valid by itself as data in an xml file, open the xml file containing this in internet explorer and it will complain that it is not formed properly.
In fact, if you try it in Internet Explorer you'll see that this is one such reader that will quite happily handle ">" within the data (well, with IE6 as I'm using here it seems to).
We have. And n their case they are just using other WML writers that output in this fashion.ray.wurlod wrote:... and, on that basis, you go back to the providers of the XML file and complain bitterly to them that what they're sending to you is not well-formed XML and therefore impossible to process, and require that they do send you well-formed XML.
It why we have standards!
The curious thing is that if I got back to the XML 1 pack, the reader appears to handle ">" within data fine!!
-
- Participant
- Posts: 7
- Joined: Thu Oct 23, 2003 10:00 am
In case anyone is interested and hits a similar issue, here's a routine I've written to circumvent the issue.
It takes in the the whole of the XML file data as an input parameter string (P_InputData), hence can be placed in a transformer following a folderstage, and returns a string with any ">" characters within element data as ">".
(If anyone spots any bugs or potential optimisations, please point them out. Note that this doesn't cater for ">" within attribute data, although I would hope that wouldn't occur in most XML specifications)
It takes in the the whole of the XML file data as an input parameter string (P_InputData), hence can be placed in a transformer following a folderstage, and returns a string with any ">" characters within element data as ">".
Code: Select all
L_InputDataLen = Len(P_InputData)
L_NoExtraGts = Count(P_InputData, ">") - Count(P_InputData, "<")
L_OutputData = Space(L_InputDataLen + (L_NoExtraGts * 3))
L_OutChar = 1
L_TagOpen = @FALSE
For i = 1 to L_InputDataLen
L_CurrChar = P_InputData[i,1]
If L_CurrChar = "<" Then
L_TagOpen = @TRUE
End
If L_CurrChar = ">" AND L_TagOpen = @FALSE Then
L_OutputData[L_OutChar,4] = ">"
L_OutChar = L_OutChar + 4
End Else
L_OutputData[L_OutChar,1] = L_CurrChar
L_OutChar = L_OutChar + 1
End
If L_CurrChar = ">" AND L_TagOpen = @TRUE Then
L_TagOpen = @FALSE
End
Next i
Ans = L_OutputData
Why not
Some people prefer ereplace. Change is the same function.
Code: Select all
NewLine = change("<", "<", OldLine)
NewLine = change(">", ">", NewLine)
Mamu Kim
-
- Premium Member
- Posts: 483
- Joined: Thu Jun 12, 2003 4:47 pm
- Location: St. Louis, Missouri USA
I do this before I turn it into XML or HTML. I have BASIC code which generates HTML. How do you separate valid tags from code generated tags.
If you look at my KgdGenHtmlRoutines job posted on my web site and ADN then you can see how it was implemented it.
If you look at my KgdGenHtmlRoutines job posted on my web site and ADN then you can see how it was implemented it.
Last edited by kduke on Fri Jul 16, 2004 4:40 pm, edited 1 time in total.
Mamu Kim
-
- Premium Member
- Posts: 483
- Joined: Thu Jun 12, 2003 4:47 pm
- Location: St. Louis, Missouri USA
-
- Participant
- Posts: 7
- Joined: Thu Oct 23, 2003 10:00 am
Tony
I looked and I did not post this code. So here it is.
Dave, I was thinking only about output. You could do this before it turned it into XML or HTML but if it came in that way then you have broken XML or HTML because of the embedded tags.
This code is when I generate HTML documentation for routines. The jobs and routines are posted on my web site and ADN. It will loop through the code and wrap html around it. I duplicated what Ascential had done in the job documentation. They did not have anything for routines. I copied the logic out of genroutinedocs or something like that. It was posted on ADN as well. The ASCL post would only do one job. I wanted to document all jobs or just one category as well. I wanted a simple index with links to all the jobs or routines. So it creates a directory and generates html for all the jobs and then one html page called all_index.html with links to all the other pages. It also does the same for routines in a different directory. I think I made the directory KimD/Jobs/20040726 for today. So you could get a snapshot of how the jobs looked on a given day. The ASCL code is super fast. It will document hundreds of jobs in seconds. Same for routines.
If it had a where used section then you would not need DwNav or MetaStage. Not really but it is very nice html. It really does need a table lookup section. DwNav has generated similar documentation for a while now. It did not look as pretty until last week. Now it looks the same. I do not want to make money on DwNav. I just need it to fill a hole when I think DataStage or MetaStage lacks something that I want or need. So I wanted the html to all look the same so I could use both together. Anyway DwNav has an index by table name. So all table name or hash file names have a link to the job that reads or write to it. That is my idea of where used. The table names are sorted first by what Reporting Assistant calls OLETYPE. This is a stage type like CHashInput or CSeqInput. They are ugly names but I think most of us can figure it out. I thought about building a lookup to change this into something more readable. When I get time I guess. Next version.
On my web site there is a link to the html documentation generated by these jobs. There is a link right above the zip file link. So you can get an idea if you like the way it looks.
Since I downloaded the DataStageBackup.bat file, I like the YYYYMMDD folder concept. I am using it in this documentation stuff. The idea is this. You promote once a month or twice or whatever. You run the backup script, the DSaveAsBmp.bat and the documentation jobs and you take a snapshot of things before and after you change them. Sort of an ETL audit trail.
I am not sure if there is much more I could do to tie all these pieces together except to tie in Version Control. I think if you named your batches YYYYMMDD then that might help. I think we need Craig or Byron, one of the VC experts to figure that out. Trying to build some kind of best practices automated software lifecycle.
What do you think? Waste of time? Overkill?
I looked and I did not post this code. So here it is.
Code: Select all
done = @false
SpaceCnt = 0
for x=1 to len(tmpLine) until done
check = tmpLine[x,1]
if check=" " then
SpaceCnt += 1
end else
done = @true
end
next x
if SpaceCnt > 0 then
tmpLime = trim(tmpLine,' ','L')
for x=1 to SpaceCnt
tmpLine = " " : tmpLine
next x
end
tmpLine = change(tmpLine, "<", "<")
tmpLine = change(tmpLine, ">", ">")
tmpLine = change(tmpLine, char(13):char(10), "<br>")
if field(trim(tmpLine), '.', 2)[1,3] = 'htm' then
tmpLine = '<A href="':trim(tmpLine):'">':field(trim(tmpLine), '.', 1):'</A>'
end
if tmpLine="" then
tmp := tmpLine:" "
end else
tmp := tmpLine
end
Dave, I was thinking only about output. You could do this before it turned it into XML or HTML but if it came in that way then you have broken XML or HTML because of the embedded tags.
This code is when I generate HTML documentation for routines. The jobs and routines are posted on my web site and ADN. It will loop through the code and wrap html around it. I duplicated what Ascential had done in the job documentation. They did not have anything for routines. I copied the logic out of genroutinedocs or something like that. It was posted on ADN as well. The ASCL post would only do one job. I wanted to document all jobs or just one category as well. I wanted a simple index with links to all the jobs or routines. So it creates a directory and generates html for all the jobs and then one html page called all_index.html with links to all the other pages. It also does the same for routines in a different directory. I think I made the directory KimD/Jobs/20040726 for today. So you could get a snapshot of how the jobs looked on a given day. The ASCL code is super fast. It will document hundreds of jobs in seconds. Same for routines.
If it had a where used section then you would not need DwNav or MetaStage. Not really but it is very nice html. It really does need a table lookup section. DwNav has generated similar documentation for a while now. It did not look as pretty until last week. Now it looks the same. I do not want to make money on DwNav. I just need it to fill a hole when I think DataStage or MetaStage lacks something that I want or need. So I wanted the html to all look the same so I could use both together. Anyway DwNav has an index by table name. So all table name or hash file names have a link to the job that reads or write to it. That is my idea of where used. The table names are sorted first by what Reporting Assistant calls OLETYPE. This is a stage type like CHashInput or CSeqInput. They are ugly names but I think most of us can figure it out. I thought about building a lookup to change this into something more readable. When I get time I guess. Next version.
On my web site there is a link to the html documentation generated by these jobs. There is a link right above the zip file link. So you can get an idea if you like the way it looks.
Since I downloaded the DataStageBackup.bat file, I like the YYYYMMDD folder concept. I am using it in this documentation stuff. The idea is this. You promote once a month or twice or whatever. You run the backup script, the DSaveAsBmp.bat and the documentation jobs and you take a snapshot of things before and after you change them. Sort of an ETL audit trail.
I am not sure if there is much more I could do to tie all these pieces together except to tie in Version Control. I think if you named your batches YYYYMMDD then that might help. I think we need Craig or Byron, one of the VC experts to figure that out. Trying to build some kind of best practices automated software lifecycle.
What do you think? Waste of time? Overkill?
Mamu Kim