Page 1 of 1

Reading XML files without loading structures

Posted: Thu Nov 15, 2007 6:04 am
by vramy
Hi all,

We are wondering how one could read various xml files [b]without writing as much jobs as there are different schemas.[/b]

We have done it in a one to one context by loading the metadata within the xml input stage. Now, as we will have more than 3000 xml files, w want to have a job doing like this :

xml input ------> dataset (or oracle)

knowing we have the xsd of tjs input file

We have done it already on seq file using osf and it works well. Has anyone already tried it ?

Thanks for your help

Posted: Tue Nov 20, 2007 4:12 pm
by mujeebur
Just wondering to know Expert's help, Why XML stage does not have 'Runtime Column Propagatin' property ?

I have done many jobs in Parallel without loading the metadata of a job (sequential file , parallel job ) by passing the schema file dynamically while running job.

Just wondered to know why can't it be possibel for XML stage ?

Apprecaited your advises.

Posted: Tue Nov 20, 2007 8:16 pm
by eostic
The XML Stages have an interesting history. The current stages are actually a "2.x" implementation --- the first XML Stages (XMLReader and XMLWriter, if anyone remembers), were Universe Basic Stages and did some reasonable parsing, but nothing industry standard (this was in 1998 or 1999 before xsd was in concrete). For reasons related to development resources, time, and other pressures, the stage implementations needed to exploit the existing column grid for functionality.......requiring that engineering use the Description attribute for "actional" meta data --- the XPATH that we use to actually perform reading and writing.... I don't know why they weren't ever enhanced to enable schemas.

In the meantime, the only way to do things "reasonably" dynamic is to create your own xslt and supply that xslt at runtime. I say reasonable, because the output link is fixed per the job (unless you want to get into the area of generating your own .dsx files).

I've experimented with this, and it appears that you can be fairly liberal with the output columns, and have many more columns defined than you will really have in any particular xslt.

It's not simple, and requires some in-depth knowlege of xslt, but it is do-able, albeit not the same as other stage types with schema and direct RCP support.

Ernie

Posted: Wed Nov 21, 2007 9:30 am
by mujeebur
Ernie , Excellent description and I appreciate it.

Do we have any enhanced features or RCP support in XML pack/plug-ins , if you buy seperately from IBM ?

Posted: Wed Nov 21, 2007 9:59 am
by throbinson
We are "toying" with the same concept. The property that makes it work for us is the "Silently drop columns not in table" in the Teradata Enterprise Stage. This means that we can use the xslt to define all possible XPATHS and have columns defined for all possibilities and have the target stage silently drop all columns that don't apply. The only non-generic aspect is keeping the xslt up to date and defining ALL columns to the XMLInput job.

The other thing we do is "reverse engineer" the xslt from tracing a Server job that hits the same Teradata table written for this very purpose. This method is detailed in the documentation.

Posted: Wed Nov 21, 2007 1:35 pm
by eostic
That's very cool....it confirms what we said up above and the immediate question from mujeebur...the XML Stages don't support RCP, but those downstream Operators smart enough to "throw away" columns can at least make the solution do-able. Nice work. I wonder what other target Operators have that setting.....

Posted: Wed Nov 21, 2007 1:59 pm
by throbinson
I think it's pretty neat too. I just wish we could take the next step with the XMLInput stage and enable RCP. Then the source xml could supply both the data AND the parsing rules (xslt) and the target could supply the target columns and RCP would join the two together.