XML Parsing in datastage and Oracle

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
samyamkrishna
Premium Member
Premium Member
Posts: 258
Joined: Tue Jul 04, 2006 10:35 pm
Location: Toronto

XML Parsing in datastage and Oracle

Post by samyamkrishna »

Hi All,

We have a couple of jobs which read XML data from DB2 and loads into oracle.
datastage is being used to extract the data and do parse the XML data into tabular format and load it into Oracle.

These jobs are running for 4 hours each.

What i am planning to do now is to load the data from DB2 to Oracle and do the XML parsing in the Oracle query.

the amount of data is approximatly 24000000 records.

wanted to check if this will give any kind of benifit.

Thanks in Advance,
Samyam
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I'm not sure that question is really answerable. Seems to me the only way to really know would be to try it both ways on your system with your data and compare the results.
-craig

"You can never have too many knives" -- Logan Nine Fingers
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

It's a comparison that can't be made on pure speed or technology, regardless of which parser does it faster.

Databases have had data types for xml for a long time....each has some very interesting and unique technology for it. Nice support for certain xml constructs, compression, etc.

...but many sites also have to evaluate what the database is used for, who the users are that access it, their xml expertise, what is their experience with anything beyond the simplest sql, what APIs/ODBC drivers, etc. are being used...

I find there are still a HUGE number of sites that can't even remotely look at xml "in the database" because of the different users, applications, drivers, environments, releases (of everything) that they have to support. They simply have to shred the xml, whether by hand or with DataStage or some other ETL tool.

Craig is right....you will simply have to do the test yourself to be certain...the size of the xml, the volume of xml documents, the number of elements, the complexity of the schema (and more) are just "some" of the variables that can impact this. ...just be certain to also consider the "non-technical" variables that may (or may not) be important in your decision.

Please share your results!

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
Post Reply