Page 1 of 1

XML Input stage

Posted: Thu Apr 13, 2006 9:46 am
by andyrids
I'm having a problem getting the XML Input stage to work. I have my input source as a sequential file (the actual XML doc) reading each line as a variable length string i.e with the delimiter set to "000". Each line is sent to the XML Input stage with 'Column content' set to "XML document". The output columns for the XML Input stage are loaded from a table definition I defined using the XML Meta Data Importer with my XML doc DTD.

The problem I have is with the input - I don't know how to get the XML input stage to understand the lines of input sent from the sequential file source and therefore the XML parsing fails.

All warning messages refer to line 1?? e.g. "Equity_Index..XML_Input_22: XML input document parsing failed. Reason: Xalan fatal error (publicId: , systemId: , line: 1, column: 23): Invalid document structure"

"Equity_Index..XML_Input_22: XML input document parsing failed. Reason: Xalan fatal error (publicId: , systemId: , line: 1, column: 98): Invalid document structure"

etc..

Can anyone help me with this?

Thanks

Re: XML Input stage

Posted: Thu Apr 13, 2006 10:14 am
by ogmios
Not the expert on XMLInput but if I remember correctly you don't read an XML input file with a Sequential Stage :D . You use a FolderStage to point to the XML file and connect that to XMLInput.

"There is one crucial design requirement of XML Input - you need to pass it an input link contain a URL or an file path or an XML document".

If you installed a 7.5 client you should have the documentation on the XML stages on your PC.

Ogmios

Posted: Thu Apr 13, 2006 10:33 am
by diamondabhi
Andyrids,
Ogmios is right, you should use folder stage as input to the XML input stage. Aslo check the style sheet settings.

Thanks,
Abhi.

Posted: Mon Apr 17, 2006 7:02 pm
by aartlett
If you have a lot of XML data to parse and are breaking it down into several streams you might want to consider going outside of DataStage to process the XML into 1 or more flat files and deal with them that way.

I changed a job stream that ran for over 4 hours processing 45 streams out of 3 largish (few hundred MB) XML's into a 20 minute stream by using an external XSLT parser. the flat files were then processed.

My understanding is that the XML addon for DS is really for trickle fed real time data, from MQseries etc, not for humungous bulk files coming through.

Consider looking beyond the sand pit for solutions, sometimes you'll be suprised :)

Posted: Mon Apr 17, 2006 7:56 pm
by chulett
Hey Andrew, care to pass along the name / site for that external XSLT parser you mentioned? :wink:

Posted: Mon Apr 17, 2006 8:19 pm
by aartlett
I've used two parsers;
Saxxon and Xalan/Xerces. Xalan/Xerces is on apache.org, Saxxon you'll have to search for.

I think Xalan and Xerces came orginally from IBM and were open sourced to Apache control a few years ago.

There are others available, but I like the use OS software.

You'll need to read up on XSLT to create the scripts. i mioght be able to find a copy of the XSLT I used to create CSV files, but I don't have it on hand at the moment.

Posted: Mon Apr 17, 2006 10:56 pm
by chulett
Thanks, I'll have to check them out. Interesting, but my understanding is that the XML Output stage is based on Xerces - perhaps the others as well.

Posted: Tue Apr 18, 2006 12:04 am
by aartlett
I think it is a xerces/saxxon implementation.

It just doesn't seem to handle the throughput as well as a more "native" on can. Sometimes we must look beyond DS for our solutions.

Posted: Tue Apr 18, 2006 12:32 am
by chulett
Oh, agreed - it can be slow as mud at times generating 'large' amounts of XML... that's why I was curious which external parsers you've used.