Difference between XML stage and XML Input/ Output stages

rmrama · Post by **rmrama** » Sun Dec 02, 2012 5:28 am

Hello and thank you for reading my question.

I'm used the XML Input stage as the way of managing XML inputs into targets (db tables for example). Did not like working with the stage as the repeating element concept, plus my terrible understanding of XSLT, made me resort to sub-par method of coding (more than 1 XML Input stage to handle each repeating element and Join stage to marry data together).

I'm looking at the XML stage and it appears to offer a better method of handling placement of elements in the output links. Not having to nominate a repeating element, I see an opportunity to read the XML document once and process all the repeating elements together.

The question - is the XML stage IBM's improved offering to the XML Input/ Output stages or are both the same thing?

Regards,
Rama

chulett · Post by **chulett** » Sun Dec 02, 2012 8:16 am

The new XML stage is meant to replace the older XML Input and Output stages. As you've seen, it is a completely different architecture under the covers and is strictly xsd driven. Ernie can spell out the advantages (speed, file size, ??) and why there are times when you may need to use the older ones.

eostic · Post by **eostic** » Sun Dec 02, 2012 8:21 pm

It's a long discussion, and there are still some valid reasons to use the older stage.......but here are some key points relative to your initial questions....

a) xml is not relational.....ETL tools and rdbms' are. The natural place that they match up is in xml's repeating nodes....where each repeating node path is like a set of directly related (parent-child-grandchild) tables. One instance of a sub-node is like a "row" in a relational table or the link in a DS Job. The new stage is no different.....

b) ....but....it handles the manipulation of those nodes, especially when "building xml", within the Stage. So you don't have to play games with lots of stages, parking the completed xml, performing joins, etc. etc. Concepts fairly the same --- but the method is far simpler.

c) This means that you can bring in multiple links...perhaps the multiple "purchase order line items" on one link and the multiple "purchase order delivery addresses" on another. You might have 97 line items to be sent to 14 addresses --- all of them with a common purchase order...a hierarchy with two completely independent repeating nodes with a common parent. The new stage presents you with "steps"...these steps are analogous to stages from a functionality point of view, but the xml stage understands what parts of the hierarchy are being stored from step to step, and it provide join semantics (another step type) to bring them all together.

d) ...you need to have xsd's to do this. ...and you will have an easier time if you are familiar with xsd's.

Jump into it though....the new stage does a lot of things...but where it really excels is in simplifying the writing of complex xml.

Ernie