Dealing with XHTML Files

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Aquilis
Participant
Posts: 204
Joined: Thu Apr 05, 2007 4:54 am
Location: Bangalore
Contact:

Dealing with XHTML Files

Post by Aquilis »

Hello All,
Has anybody worked with XHTML data files. I was working with hierarchical XML before but the client has come out with XHTML stuff. i never worked on these before.
So can anybody elaborate what would be the possible issues with XHTML comapred to XML.
1. I can see that the size of the XHTML files are very bulky compared to simple XML files since it's combination of XML & HTML.

I developed a simple job to explore XHTML but i am ending up with following error.

Code: Select all

XML input document parsing failed. Reason: Xalan fatal error (publicId: , systemId: , line: 0, column: 0): An exception occurred! Type:RuntimeException, Message:The primary document entity could not be opened.
I have tried most of the stuffs but couldn't able to make it. can any body has any suggestions around it ?
Aquilis
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Can't imagine it is any kind of supported. I wonder if there are any kind of XHTML -> XML converters out there?
-craig

"You can never have too many knives" -- Logan Nine Fingers
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

From everything I can tell by looking at samples on the web, it's nothing but pure XML.....the tags are very html-like, meaning that they don't convey "metadata," they convey formatting......but you should very definitely be able to pull out the bits and pieces that you want.

The real problem is that the "repeating" elements are format tags.....no structural consistency, and their ordering and repetition is simply based on how the author wanted things to "look" not how they are related to each other.....so it may be a messy Job with lots of output links, for each of the deeper repeating format elements that you might like to pull.

It's pretty ugly though....I can't imagine why anyone would use this over a stylesheet with xml for their data.

Your error above is a normal xml error. Unlikely to have anything to do with the fact that you will be reading an XHTML document vs a regular XML document. How are you picking up the document from disk?

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
Aquilis
Participant
Posts: 204
Joined: Thu Apr 05, 2007 4:54 am
Location: Bangalore
Contact:

Post by Aquilis »

Ernie,
Thanks for sharing the information.

You were right, I am using reggular XML Table definitions approach to import. Was it wrong? :?
Aquilis
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Absolutely....but as I said, be careful...you are going to end up with all kinds of table definitions, depending on the creativity of the author of your xhtml document. Every repeating list or other item within the "body" might need its own link and its own repeating element. Could get ugly, but is certainly do-able. The problem is that xhtml isn't going to care if you have "employees" at the top of the page and "automobiles" listed at the bottom....if it uses the same xhtml tags to treat this as "bold lists" (for example), then they will simple be an unrelated pair of repeating groups with formatting. Pretty useless. I'd see if the data exists also in non-formatted regular xml.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
Post Reply