XML Output Stage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
bakul
Participant
Posts: 60
Joined: Wed Nov 10, 2004 2:12 am

XML Output Stage

Post by bakul »

Hello,

I need to read data from a database and create XML files in DS EE. To generate the XML, multiple XML Output stages have to be used. I have been able to generate the XML (by maintaining sequential flow). My only concern is about the performance. Many posts on this forum mention that the XML stages can handle only 50 - 100 MB of data. It also mentions that it is almost 10 times faster to use a transformer to create the XML.
I am using datasets as input. The row size would be around a few KB's. However there can be more than 1,500,000 rows. Would the XML Output stage be able to handle this? Are there any more known issues in the XML Output stage?

Regards,
Bakul
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

You will notice the XML stages are in the real time folder of Designer, XML is best suited for B2B transactions as individual documents or in small volumes, it is not suited for large volume batch processing. Do your own benchmark, write out the same file as XML and as sequential file, it does not matter what columns or what format, just do a ballpark evaluation and decide for yourself.

Writing the XML out via the sequential file stage will always be faster as there is absolutely no validation of the format. But you do have to write all the tagging yourself and it can be hard to maintain. A lot of people do it in server jobs because you can do the tagging in server routines using BASIC code. It can be harder to write in parallel jobs. You could even write your own XML parallel output stage.
Post Reply