Loading XML data from a URL into a SQL Server table

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply
sbass1
Premium Member
Premium Member
Posts: 211
Joined: Wed Jan 28, 2009 9:00 pm
Location: Sydney, Australia

Loading XML data from a URL into a SQL Server table

Post by sbass1 »

Hi,

I'm trying to read an XML file and output to a SQL Server table. I've reviewed this tutorial:

http://www.ibm.com/developerworks/data/ ... ng/#partII

However, when I tick "Include namespace declaration" and click "Load...", it appears to parse the XML file correctly, but does not create the column definitions - the Columns tab remains blank.

The file I need to import is located here:

http://www.rba.gov.au/RSS/rss_cb_exchange_rates.xml

and consists of exchange rates as defined by the Reserve Bank of Australia updated on a daily basis.

Questions:

1) I'd prefer to load this data from the URL, rather than creating an external process to download it each day. How do I specify the URL as the input to the XML Input stage?

2) What do I need to do to get Datastage to parse the XML file (URL) and define the appropriate columns?

3) If I specify the URL as the input to the XML Input stage, then Load the namespace declarations, are they loaded dynamically from the URL, or do I first need to make a copy of the file locally?

Thanks,
Scott
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Interesting topic..... I haven't tried bringing in XML via RSS feed within the XMLInput Stage, but it's certainly a possibility. However, RSS response format, while "xml" in structure, is not a formal xml document (doesn't have the correct header, mostly).

...so...step one is to first save your RSS feed payload....add an xml header, and then do the import..... the importer is expecting normal xml..... and that's when you will pick up your namespaces also.

Then get the job to work perfectly reading a flat file...once you know the metadata and your retrieval is correct (reading xml can be simple or difficult --- check out the many threads here), turn it on to the RSS source.....

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

...also... the easiest way to pass in the URL is just to hard code it in an upstream Transformer or read it in via flat file and pass it into the XMLInput Stage inside of one large string column. URL is one of the options on the Stage.....

This is a cool one if it works.

I've been experimenting a lot with RSS with DataStage, where DS is the "source" of the RSS feed..... this is the opposite and opens up a lot of possibilities...

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Just tried this......and got hit with a xalan error......[The schemaLocation attribute does not contain pairs of values.].... which means it found the url, but isn't happy with the content it found...and that could mean a lot of things..... be prepared for a lot of playing with variations, and possibly needing a more pure RSS solution such as calling a custom class in Java....it also could be something simple, but some digging is in order. Do you have a variety of RSS urls like this one for comparison and contrast?

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
sbass1
Premium Member
Premium Member
Posts: 211
Joined: Wed Jan 28, 2009 9:00 pm
Location: Sydney, Australia

Post by sbass1 »

Hi Ernie,

Thanks for all your replies. Much and genuinely appreciated. I'm neither an XML nor RSS guru (I've played with it some), and this appears to be falling into the too hard basket.

So, things I've tried:

* opened a local copy of the RSS XML file in Eclipse (I suspect any XML-aware editor would do). It seemed to parse the file OK.

* parsed the file with XMLStarlet. See http://xmlstar.sourceforge.net/. It too seemed to parse the file OK.

* someone else in our team has written a Perl script to parse the file and write out a CSV file. If you like, I can send you the Perl script (it's not long at all). Probably via PM since this isn't a Perl forum. But that's only if you're curious :wink:

So, we do have a solution, but it's clunky as hell:

* ftp the RSS XML file
* run the Perl script
* read the CSV file into DS

Now, since DS is such a wonderful, whizbang, greatest ETL tool on Earth...AND owned by IBM, which is the icing on the cake...I was hoping I could implement a pure DS solution.

But the reality is I don't have weeks to work on this (and I suspect neither do you), so I may have to go with clunky since I can't make DS work its magic...

Thanks,
Scott

P.S.: OT: can DS read directly from a pipe, i.e. directly from the output of a command, i.e. the Perl CSV written to stdout?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

sbass1 wrote:P.S.: OT: can DS read directly from a pipe, i.e. directly from the output of a command, i.e. the Perl CSV written to stdout?
Pipes and stdout are two different things, but yes the Sequential File stage supports both. The former by a specific Pipe read/write enabling option in the stage and the latter via the 'Filter' option where the Filter command writes data to stdout and DataStage reads stdin as if it were a file.
-craig

"You can never have too many knives" -- Logan Nine Fingers
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Less "clunky" would be to just read the xml payload from RSS directly from disk if you are already able to get it to disk..... Reading the XML is do-able..it's the http bits at run time combined with the xml parsing that are complicating things. Once you have the xml, even if the namespaces give us problems, they can just be edited out......

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
Post Reply