XML

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
James Kerrr
Participant
Posts: 11
Joined: Wed Dec 03, 2003 10:29 am

XML

Post by James Kerrr »

Hi all,

We are starting to explore the world of XML and I had a few questions.

With 7.0, does anyone have any experience dealing with XML files? Specifically, are there any size limitations to XML sources or targets? Does DataStage handle any of the XML shemas?

Any other feedback on DataStage 7.0's XML improvements would be very helpful.

thanks again.
Gazelle
Premium Member
Premium Member
Posts: 108
Joined: Mon Nov 24, 2003 11:36 pm
Location: Australia (Melbourne)

Post by Gazelle »

Our experiences with XML in Datastage 7.0:
  1. Trouble with using XML in PX. We resorted to using a Server job.
  2. No "parallelism" under PX, so using a Server job was no great loss.
  3. Big crash (core dump) when parsing large files (~30MB+). This was resolved by applying a patch from Ascential.
  4. Could not split an xml file into separate files (eg. one file for each element within the xml), where there are element hierarchies (eg. parent-child relationships). We ended up creating a separate job for each element, and processing the one xml file multiple times.
  5. Problems with using DTD schema files:
    1. Incorrectly tried to parse characters within CDATA fields (eg. could not handle embedded greater-than symbols).
    2. Required the full location of the DTD file, instead of allowing a relative location to be specified.
We also hit a problem with embedded control characters in the XML. I'd guess that most XML parsers would baulk at this, though.

You might want to consider running the XML through a "pre-processor" script, to make it nice and simple for Datastage.
Or use another XML parser to convert to standard sequential files that can easily be handled by Datastage. We chose not to go down this path due to:
  • The cost of XML parser software.
  • Having to maintain another metadata repository.
That's all I can think of at the moment. Have fun!
Post Reply