DSXchange

James Kerrr · Post by **James Kerrr** » Mon Apr 26, 2004 3:02 pm

Hi all,

We are starting to explore the world of XML and I had a few questions.

With 7.0, does anyone have any experience dealing with XML files? Specifically, are there any size limitations to XML sources or targets? Does DataStage handle any of the XML shemas?

Any other feedback on DataStage 7.0's XML improvements would be very helpful.

thanks again.

Gazelle · Post by **Gazelle** » Tue Apr 27, 2004 12:51 am

Our experiences with XML in Datastage 7.0:

Trouble with using XML in PX. We resorted to using a Server job.
No "parallelism" under PX, so using a Server job was no great loss.
Big crash (core dump) when parsing large files (~30MB+). This was resolved by applying a patch from Ascential.
Could not split an xml file into separate files (eg. one file for each element within the xml), where there are element hierarchies (eg. parent-child relationships). We ended up creating a separate job for each element, and processing the one xml file multiple times.
Problems with using DTD schema files:
1. Incorrectly tried to parse characters within CDATA fields (eg. could not handle embedded greater-than symbols).
2. Required the full location of the DTD file, instead of allowing a relative location to be specified.

We also hit a problem with embedded control characters in the XML. I'd guess that most XML parsers would baulk at this, though.

You might want to consider running the XML through a "pre-processor" script, to make it nice and simple for Datastage.
Or use another XML parser to convert to standard sequential files that can easily be handled by Datastage. We chose not to go down this path due to:

The cost of XML parser software.
Having to maintain another metadata repository.

That's all I can think of at the moment. Have fun!