Page 1 of 1

XML Parser Problem

Posted: Tue Sep 14, 2010 7:36 pm
by BradMiller
I have a job where I am extracting xml file/document which contains 58 from an XML file/document,importing xml schema from xml document and writing to a sequential file.I am importing the metadata using the xml file instead of xsd.I validated the xml file using w3schools xml validator and it shows me that the xml is good.But when I run the job I get the following warning message and not loading any records to the target file.All my other xml jobs are working properly.The warning message is "XML_Input_0,0: Warning: secondxml.XML_Input_0: XML input document parsing failed. Reason: Xalan fatal error (publicId: , systemId: , line: 2, column: 474): An exception occurred! Type:UnexpectedEOFException, Message:The end of input was not expected" and "XML_Input_0,0: Warning: secondxml.XML_Input_0: XML input document parsing failed. Reason: Xalan fatal error (publicId: , systemId: , line: 1, column: 1): Invalid document structure".

Posted: Wed Sep 15, 2010 3:59 am
by eostic
It sounds like an error in the method you are using to read the document from disk. Are you using the Sequential Stage?

If this is true, change your job so that it uses the External Source Stage, and sending a "list" of xml documents to your xml Stage...

Go to my blog url below and click on the table of contents in the upper right....find the xml section and go to the link concerning "xml sources" for a more detailed explanation.

Ernie

Posted: Wed Sep 15, 2010 1:55 pm
by BradMiller
Yes we are using sequential file,I'll search your blog on the net and look at the xml content you posted but I have one question why do we need to use external source instead of sequential file.I could not understand the problem.Appreciate for your response and help.

Posted: Wed Sep 15, 2010 3:22 pm
by chulett
Because technically an XML file isn't a flat file with rows and columns, it's a stream of data that could very well just be one long "record". Sometimes it can be read like one with success but more often reading it in that fashion just plain ol' horks it up. Best Practice is to avoid the Sequential File stage altogether and use an External Source stage to feed just the filenames to the XML Input stage and let it (the XML Input stage) do the actual reading of the files.

In a Server job, the Folder stage would take the place of the External Source stage.

Posted: Thu Sep 16, 2010 6:45 am
by eostic
...to add more detail to Craig's response, the problem is usually the fact that any random space or stray CRLF is just "noise" to an xml parser (such characters are formally ignored by the xml standard)....but a stray set of blanks, or a CRLF, or other odd character can ENTIRELY change the behavior of the Sequential Stage.

A Job could work fine for 1000's of documents and then blow up one day because of a CRLF in the middle.....it's no fault of the Sequential Stage, it is designed to look for such things. I know there are settings in the Stage that you can tweak and such, but why bother? Just have the xml stage do the actual i/o and parsing --- it's designed for that.

Ernie

Posted: Thu Sep 16, 2010 12:13 pm
by BradMiller
Thank you its very clear.