XML with DS 6/7 and Oracle 9i

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
clshore
Charter Member
Charter Member
Posts: 115
Joined: Tue Oct 21, 2003 11:45 am

XML with DS 6/7 and Oracle 9i

Post by clshore »

I'm developing jobs to manipulate sets of records where it is proposed that they will be delivered to me encapsulated in an XML document.
I'm concerned that performance may be a problem.

Any members willing to share experience/opinions?

Running DS 7.0 Parallel, currently on a Sun ES15K, but will be deployed on HP-UX, etc.

Thanks,

Carter
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Could you consider using a (perhaps multi-instance) server job, using the XML Reader stage?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

Reading and writing XML files is slower than sequential files but has similar performance to database stages. I did a simple test writing out 10 records and while a job with a sequential output processed at 9000 rows/sec the same job with an XML output went at 1500 rows/sec. However XML files sit on the DataStage server and you'll have no network overhead. With parallel extender on a Solaris server it should process very high volumes in a short period of time.

Your Oracle database and your transformations are much more likely to produce performance problems if they are not used correctly. If you are new to DataStage search through the archive of this site for threads on Oracle bulk load and the benefits of hash file lookups over Oracle lookups.
ariear
Participant
Posts: 237
Joined: Thu Dec 26, 2002 2:19 pm

Post by ariear »

Using XML in DataStage can be very tricky !
From my expirience (Only one job but a very complicated XML , you know with sieblings and optional nodes and other XML stuff) and after reading CAREFULLY ! the XML 2.0 pdf I can tell you that the fastest way to deal with it is to manipulate XML chuncks using DataStage mechanics (lookups, hash etc..) else you'll find yourself dealing wih XPATH & XSLT and finally generate some XSLT using ,say XML SPY and copy it into the XML stage not knowing how fast it will perform !
Treat the XML as strings with keys (usually there will be a unique attribute for it) and it'll perform at 9000 rows per sec as Vincent said !!
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

It's a good point. XML files can be read as sequential files as long as you parse the text around it. There are some overheads to reading XML files such as the DTD validation that will be bypassed. However I think reading from XML and writing to a non XML target is a lot easier than the other way around. You will be able to import the XML definition and load it into your XML reader stage which will save a lot of time, you can then move this data into a normal transform and break it down into the relational table columns.
trobinson
Participant
Posts: 208
Joined: Thu Apr 11, 2002 6:02 am
Location: Saint Louis
Contact:

Post by trobinson »

We are realtime so performance is measured differently i.e. how fast to do a single row versus how many rows per second but...
XMLReader was faster then XML Input because it has less functionality. Our XML was no more than 5000 bytes long. It took somewhere in the range .1-.5 secs per row. The Oracle database write for the parsed row took .01-.o6 secs per row in comparison. XML processing is a performance bottleneck. I believe it is the nature of XML and you're not going to get around it. Having said that, I like using it -- realtime. It gives good flexibility. I can't imagine attempting to parse XML in a batch mode with thousands of rows a second.
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

I've been dealing with flat files recently, complex and delimited, and I found XML format to be far more flexible and easier to integrate. I can output an XML file with report data and applications like Excel and read it directly. It is a great way to pass information if you don't mind the processing overhead.
Post Reply