Using massive (20,000 different fields) XSD to parse XML

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
smleonard
Participant
Posts: 23
Joined: Tue Apr 27, 2004 11:48 am
Location: Westfield Center, OH

Using massive (20,000 different fields) XSD to parse XML

Post by smleonard »

Hi,

I'm new to the features in 8.5, and we just got access to the new XML stage today [due to a PMR]. I'm working on a project where we have been given two files: an XSD and an XML file. I've imported the XSD into the Schema Library Manager and it has some 20k different fields. For the moment, all I am trying to do is use the XML stage to output the data to a sequential file. Because the schema is so large, the job is incredibly slow to respond. I was unable to validate the XML Parser because after 10 minutes, it timed out. What I was hoping to do, is use RCP to generate the fields rather than have them defined in the job.

Any thoughts on how I should handle this?

Thanks,
-Sean
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

A couple of things....

a) what is your objective? If it is to parse out a few of the nodes of a document that uses this xsd, figure out which ones they are for your application purposes, and get individual unique xml documents that make sense for that use case. xsd's with 20k elements and attributes are generally "all encompassing" xsd's that apply to every facet of your company or industry, not the problem immediately at hand.

b) if you want to work with the new xml stage (there are good reasons for doing so, and also reasons why it isn't necessary), create or have someone create a separate xsd for each of those interesting parts identified in (a). You can also take your smaller run-time sample document(s) and pass them thru "trang" a free xml-to-xsd open source tool that works very nicely.

c) if you are just reading these documents and putting them into a relational database, you are likely to be just fine with the existing xmlInput Stage....and can start using that with the run-time documents that pertain to your application.

There are lots of other things to consider, but that should get you started, and let's keep discussing it.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

[btw --- "huge" xsd's are unwieldy in any circumstance, with any tooling --- so in 9.1, announced in October, 2012 at IOD, we introduce Schema Views, which is an ability to subset the xsd within the DataStage environment. This makes the development time far less error prone and far less time consuming, and also speeds up run-time initialization].

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
Post Reply