Page 1 of 1

Using massive (20,000 different fields) XSD to parse XML

Posted: Wed Nov 14, 2012 12:51 pm
by smleonard
Hi,

I'm new to the features in 8.5, and we just got access to the new XML stage today [due to a PMR]. I'm working on a project where we have been given two files: an XSD and an XML file. I've imported the XSD into the Schema Library Manager and it has some 20k different fields. For the moment, all I am trying to do is use the XML stage to output the data to a sequential file. Because the schema is so large, the job is incredibly slow to respond. I was unable to validate the XML Parser because after 10 minutes, it timed out. What I was hoping to do, is use RCP to generate the fields rather than have them defined in the job.

Any thoughts on how I should handle this?

Thanks,
-Sean

Posted: Wed Nov 14, 2012 2:21 pm
by eostic
A couple of things....

a) what is your objective? If it is to parse out a few of the nodes of a document that uses this xsd, figure out which ones they are for your application purposes, and get individual unique xml documents that make sense for that use case. xsd's with 20k elements and attributes are generally "all encompassing" xsd's that apply to every facet of your company or industry, not the problem immediately at hand.

b) if you want to work with the new xml stage (there are good reasons for doing so, and also reasons why it isn't necessary), create or have someone create a separate xsd for each of those interesting parts identified in (a). You can also take your smaller run-time sample document(s) and pass them thru "trang" a free xml-to-xsd open source tool that works very nicely.

c) if you are just reading these documents and putting them into a relational database, you are likely to be just fine with the existing xmlInput Stage....and can start using that with the run-time documents that pertain to your application.

There are lots of other things to consider, but that should get you started, and let's keep discussing it.

Ernie

Posted: Wed Nov 14, 2012 2:23 pm
by eostic
[btw --- "huge" xsd's are unwieldy in any circumstance, with any tooling --- so in 9.1, announced in October, 2012 at IOD, we introduce Schema Views, which is an ability to subset the xsd within the DataStage environment. This makes the development time far less error prone and far less time consuming, and also speeds up run-time initialization].

Ernie