XML with DS 6/7 and Oracle 9i

clshore · Post by **clshore** » Thu Oct 30, 2003 1:02 pm

I'm developing jobs to manipulate sets of records where it is proposed that they will be delivered to me encapsulated in an XML document.
I'm concerned that performance may be a problem.

Any members willing to share experience/opinions?

Running DS 7.0 Parallel, currently on a Sun ES15K, but will be deployed on HP-UX, etc.

Thanks,

Carter

ray.wurlod · Post by **ray.wurlod** » Thu Oct 30, 2003 3:23 pm

Could you consider using a (perhaps multi-instance) server job, using the XML Reader stage?

vmcburney · Post by **vmcburney** » Thu Oct 30, 2003 3:55 pm

Reading and writing XML files is slower than sequential files but has similar performance to database stages. I did a simple test writing out 10 records and while a job with a sequential output processed at 9000 rows/sec the same job with an XML output went at 1500 rows/sec. However XML files sit on the DataStage server and you'll have no network overhead. With parallel extender on a Solaris server it should process very high volumes in a short period of time.

Your Oracle database and your transformations are much more likely to produce performance problems if they are not used correctly. If you are new to DataStage search through the archive of this site for threads on Oracle bulk load and the benefits of hash file lookups over Oracle lookups.

ariear · Post by **ariear** » Thu Oct 30, 2003 4:30 pm

Using XML in DataStage can be very tricky !
From my expirience (Only one job but a very complicated XML , you know with sieblings and optional nodes and other XML stuff) and after reading CAREFULLY ! the XML 2.0 pdf I can tell you that the fastest way to deal with it is to manipulate XML chuncks using DataStage mechanics (lookups, hash etc..) else you'll find yourself dealing wih XPATH & XSLT and finally generate some XSLT using ,say XML SPY and copy it into the XML stage not knowing how fast it will perform !
Treat the XML as strings with keys (usually there will be a unique attribute for it) and it'll perform at 9000 rows per sec as Vincent said !!

vmcburney · Post by **vmcburney** » Thu Oct 30, 2003 5:03 pm

It's a good point. XML files can be read as sequential files as long as you parse the text around it. There are some overheads to reading XML files such as the DTD validation that will be bypassed. However I think reading from XML and writing to a non XML target is a lot easier than the other way around. You will be able to import the XML definition and load it into your XML reader stage which will save a lot of time, you can then move this data into a normal transform and break it down into the relational table columns.

trobinson · Post by **trobinson** » Fri Oct 31, 2003 7:25 am

We are realtime so performance is measured differently i.e. how fast to do a single row versus how many rows per second but...
XMLReader was faster then XML Input because it has less functionality. Our XML was no more than 5000 bytes long. It took somewhere in the range .1-.5 secs per row. The Oracle database write for the parsed row took .01-.o6 secs per row in comparison. XML processing is a performance bottleneck. I believe it is the nature of XML and you're not going to get around it. Having said that, I like using it -- realtime. It gives good flexibility. I can't imagine attempting to parse XML in a batch mode with thousands of rows a second.

vmcburney · Post by **vmcburney** » Sat Nov 01, 2003 12:56 am

I've been dealing with flat files recently, complex and delimited, and I found XML format to be far more flexible and easier to integrate. I can output an XML file with report data and applications like Excel and read it directly. It is a great way to pass information if you don't mind the processing overhead.