I'm developing jobs to manipulate sets of records where it is proposed that they will be delivered to me encapsulated in an XML document.
I'm concerned that performance may be a problem.
Any members willing to share experience/opinions?
Running DS 7.0 Parallel, currently on a Sun ES15K, but will be deployed on HP-UX, etc.
Thanks,
Carter
XML with DS 6/7 and Oracle 9i
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Participant
- Posts: 3593
- Joined: Thu Jan 23, 2003 5:25 pm
- Location: Australia, Melbourne
- Contact:
Reading and writing XML files is slower than sequential files but has similar performance to database stages. I did a simple test writing out 10 records and while a job with a sequential output processed at 9000 rows/sec the same job with an XML output went at 1500 rows/sec. However XML files sit on the DataStage server and you'll have no network overhead. With parallel extender on a Solaris server it should process very high volumes in a short period of time.
Your Oracle database and your transformations are much more likely to produce performance problems if they are not used correctly. If you are new to DataStage search through the archive of this site for threads on Oracle bulk load and the benefits of hash file lookups over Oracle lookups.
Your Oracle database and your transformations are much more likely to produce performance problems if they are not used correctly. If you are new to DataStage search through the archive of this site for threads on Oracle bulk load and the benefits of hash file lookups over Oracle lookups.
Certus Solutions
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Using XML in DataStage can be very tricky !
From my expirience (Only one job but a very complicated XML , you know with sieblings and optional nodes and other XML stuff) and after reading CAREFULLY ! the XML 2.0 pdf I can tell you that the fastest way to deal with it is to manipulate XML chuncks using DataStage mechanics (lookups, hash etc..) else you'll find yourself dealing wih XPATH & XSLT and finally generate some XSLT using ,say XML SPY and copy it into the XML stage not knowing how fast it will perform !
Treat the XML as strings with keys (usually there will be a unique attribute for it) and it'll perform at 9000 rows per sec as Vincent said !!
From my expirience (Only one job but a very complicated XML , you know with sieblings and optional nodes and other XML stuff) and after reading CAREFULLY ! the XML 2.0 pdf I can tell you that the fastest way to deal with it is to manipulate XML chuncks using DataStage mechanics (lookups, hash etc..) else you'll find yourself dealing wih XPATH & XSLT and finally generate some XSLT using ,say XML SPY and copy it into the XML stage not knowing how fast it will perform !
Treat the XML as strings with keys (usually there will be a unique attribute for it) and it'll perform at 9000 rows per sec as Vincent said !!
-
- Participant
- Posts: 3593
- Joined: Thu Jan 23, 2003 5:25 pm
- Location: Australia, Melbourne
- Contact:
It's a good point. XML files can be read as sequential files as long as you parse the text around it. There are some overheads to reading XML files such as the DTD validation that will be bypassed. However I think reading from XML and writing to a non XML target is a lot easier than the other way around. You will be able to import the XML definition and load it into your XML reader stage which will save a lot of time, you can then move this data into a normal transform and break it down into the relational table columns.
Certus Solutions
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
We are realtime so performance is measured differently i.e. how fast to do a single row versus how many rows per second but...
XMLReader was faster then XML Input because it has less functionality. Our XML was no more than 5000 bytes long. It took somewhere in the range .1-.5 secs per row. The Oracle database write for the parsed row took .01-.o6 secs per row in comparison. XML processing is a performance bottleneck. I believe it is the nature of XML and you're not going to get around it. Having said that, I like using it -- realtime. It gives good flexibility. I can't imagine attempting to parse XML in a batch mode with thousands of rows a second.
XMLReader was faster then XML Input because it has less functionality. Our XML was no more than 5000 bytes long. It took somewhere in the range .1-.5 secs per row. The Oracle database write for the parsed row took .01-.o6 secs per row in comparison. XML processing is a performance bottleneck. I believe it is the nature of XML and you're not going to get around it. Having said that, I like using it -- realtime. It gives good flexibility. I can't imagine attempting to parse XML in a batch mode with thousands of rows a second.
-
- Participant
- Posts: 3593
- Joined: Thu Jan 23, 2003 5:25 pm
- Location: Australia, Melbourne
- Contact:
I've been dealing with flat files recently, complex and delimited, and I found XML format to be far more flexible and easier to integrate. I can output an XML file with report data and applications like Excel and read it directly. It is a great way to pass information if you don't mind the processing overhead.
Certus Solutions
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn