Using XML files as source

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply
xyz_chatter
Participant
Posts: 47
Joined: Fri Oct 26, 2007 7:15 pm
Location: India

Using XML files as source

Post by xyz_chatter »

Hi,

Never used xml stages before. Would like to know that are there any advantages/disadvantages of it over using a sequential file as a source.

Thanks
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Think about it.

If your source is XML documents there's a clear benefit of being able to read them and parse the data therein into rows and columns without needing to code it yourself.

Using a Sequential File stage you would have (unnecessarily) to parse out all the XML tags, and recognize complex nested structures, repeating groups, and all kinds o' mean, nasty, ugly, horrible stuff (apologies to Arlo Guthrie).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I assume the OP was asking about XML format versus a 'normal' flat file as a source, not about trying to read XML with a Sequential file stage. Let us know, xyz.

Perhaps I'm an old fart, but I find XML an enormous PITA - unnecessarily large, unnecessarily complex, needing to be sucked into memory to parse, yada yada... yuck. I'm not sure what in the heck the so-called 'advantage' of it is. :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
xyz_chatter
Participant
Posts: 47
Joined: Fri Oct 26, 2007 7:15 pm
Location: India

Post by xyz_chatter »

Thanks for prompt reply.

Yeah chulett, i was asking about XML format versus a 'normal' flat file as a source. Thanks for your reply again. What i understood is, if we've option to choose either XML-input stage or a Flat File stage as a source, then we should go for flat file stage since there is no advantage of using XML stage.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

That would be my opinion, yes. I'm sure others here will disagree and I'd be curious to hear their reasoning. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

....it's an interesting discussion... however, in most cases, the DataStage developer doesn't have a choice. XML or flat file is a consequence of some "other" application or partner that is providing the source data...it either "is" xml or it "isn't".

If you had a choice, I'd say its a toss-up that has to weigh not only the technical issues, but also the business and management issues (ie...where the XML [or not] is coming from, what is happening with said source in the future, how much data volume is there, what other purposes the xml format has for the data [ie --- maybe it's a single source that is used for display on special workstations, in transaction formats, and also as a data shipment medium]...... no one chooses XML anymore "just because". [there was a time when they did, because they thought is was "cool"...but rarely anymore].

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
jatayl
Premium Member
Premium Member
Posts: 47
Joined: Thu Jan 19, 2006 11:20 am
Location: Rogers, AR

Post by jatayl »

I was approached to write a couple of pilot jobs for different input sources to load a table, one sequential and the other xml. I found that if there are many sources, it was more beneficial to use the flat file versus an xml file, becuase I could identify the table/schema file, and write one job to load the multiple files into their respective tables using RCP. With xml rcp was not an option. I had to use a stylesheet or define all possible output columns to the table, and then tell the table to drop columns not needed.

Bottom line. I wrote the jobs both ways, but advised that the incoming source be a flat file. My preference, flat file.

Just my $0.02.
Post Reply