Reading XML files without loading structures

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
vramy
Participant
Posts: 1
Joined: Tue Nov 25, 2003 5:15 am

Reading XML files without loading structures

Post by vramy »

Hi all,

We are wondering how one could read various xml files [b]without writing as much jobs as there are different schemas.[/b]

We have done it in a one to one context by loading the metadata within the xml input stage. Now, as we will have more than 3000 xml files, w want to have a job doing like this :

xml input ------> dataset (or oracle)

knowing we have the xsd of tjs input file

We have done it already on seq file using osf and it works well. Has anyone already tried it ?

Thanks for your help
mujeebur
Participant
Posts: 46
Joined: Sun Mar 06, 2005 3:02 pm
Location: Philly,USA

Post by mujeebur »

Just wondering to know Expert's help, Why XML stage does not have 'Runtime Column Propagatin' property ?

I have done many jobs in Parallel without loading the metadata of a job (sequential file , parallel job ) by passing the schema file dynamically while running job.

Just wondered to know why can't it be possibel for XML stage ?

Apprecaited your advises.
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

The XML Stages have an interesting history. The current stages are actually a "2.x" implementation --- the first XML Stages (XMLReader and XMLWriter, if anyone remembers), were Universe Basic Stages and did some reasonable parsing, but nothing industry standard (this was in 1998 or 1999 before xsd was in concrete). For reasons related to development resources, time, and other pressures, the stage implementations needed to exploit the existing column grid for functionality.......requiring that engineering use the Description attribute for "actional" meta data --- the XPATH that we use to actually perform reading and writing.... I don't know why they weren't ever enhanced to enable schemas.

In the meantime, the only way to do things "reasonably" dynamic is to create your own xslt and supply that xslt at runtime. I say reasonable, because the output link is fixed per the job (unless you want to get into the area of generating your own .dsx files).

I've experimented with this, and it appears that you can be fairly liberal with the output columns, and have many more columns defined than you will really have in any particular xslt.

It's not simple, and requires some in-depth knowlege of xslt, but it is do-able, albeit not the same as other stage types with schema and direct RCP support.

Ernie
mujeebur
Participant
Posts: 46
Joined: Sun Mar 06, 2005 3:02 pm
Location: Philly,USA

Post by mujeebur »

Ernie , Excellent description and I appreciate it.

Do we have any enhanced features or RCP support in XML pack/plug-ins , if you buy seperately from IBM ?
throbinson
Charter Member
Charter Member
Posts: 299
Joined: Wed Nov 13, 2002 5:38 pm
Location: USA

Post by throbinson »

We are "toying" with the same concept. The property that makes it work for us is the "Silently drop columns not in table" in the Teradata Enterprise Stage. This means that we can use the xslt to define all possible XPATHS and have columns defined for all possibilities and have the target stage silently drop all columns that don't apply. The only non-generic aspect is keeping the xslt up to date and defining ALL columns to the XMLInput job.

The other thing we do is "reverse engineer" the xslt from tracing a Server job that hits the same Teradata table written for this very purpose. This method is detailed in the documentation.
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

That's very cool....it confirms what we said up above and the immediate question from mujeebur...the XML Stages don't support RCP, but those downstream Operators smart enough to "throw away" columns can at least make the solution do-able. Nice work. I wonder what other target Operators have that setting.....
throbinson
Charter Member
Charter Member
Posts: 299
Joined: Wed Nov 13, 2002 5:38 pm
Location: USA

Post by throbinson »

I think it's pretty neat too. I just wish we could take the next step with the XMLInput stage and enable RCP. Then the source xml could supply both the data AND the parsing rules (xslt) and the target could supply the target columns and RCP would join the two together.
Post Reply