Page 1 of 1

Using XML files as source

Posted: Wed Mar 19, 2008 12:52 am
by xyz_chatter
Hi,

Never used xml stages before. Would like to know that are there any advantages/disadvantages of it over using a sequential file as a source.

Thanks

Posted: Wed Mar 19, 2008 2:25 am
by ray.wurlod
Think about it.

If your source is XML documents there's a clear benefit of being able to read them and parse the data therein into rows and columns without needing to code it yourself.

Using a Sequential File stage you would have (unnecessarily) to parse out all the XML tags, and recognize complex nested structures, repeating groups, and all kinds o' mean, nasty, ugly, horrible stuff (apologies to Arlo Guthrie).

Posted: Wed Mar 19, 2008 7:26 am
by chulett
I assume the OP was asking about XML format versus a 'normal' flat file as a source, not about trying to read XML with a Sequential file stage. Let us know, xyz.

Perhaps I'm an old fart, but I find XML an enormous PITA - unnecessarily large, unnecessarily complex, needing to be sucked into memory to parse, yada yada... yuck. I'm not sure what in the heck the so-called 'advantage' of it is. :?

Posted: Wed Mar 19, 2008 7:43 am
by xyz_chatter
Thanks for prompt reply.

Yeah chulett, i was asking about XML format versus a 'normal' flat file as a source. Thanks for your reply again. What i understood is, if we've option to choose either XML-input stage or a Flat File stage as a source, then we should go for flat file stage since there is no advantage of using XML stage.

Posted: Wed Mar 19, 2008 7:54 am
by chulett
That would be my opinion, yes. I'm sure others here will disagree and I'd be curious to hear their reasoning. :wink:

Posted: Wed Mar 19, 2008 10:50 am
by eostic
....it's an interesting discussion... however, in most cases, the DataStage developer doesn't have a choice. XML or flat file is a consequence of some "other" application or partner that is providing the source data...it either "is" xml or it "isn't".

If you had a choice, I'd say its a toss-up that has to weigh not only the technical issues, but also the business and management issues (ie...where the XML [or not] is coming from, what is happening with said source in the future, how much data volume is there, what other purposes the xml format has for the data [ie --- maybe it's a single source that is used for display on special workstations, in transaction formats, and also as a data shipment medium]...... no one chooses XML anymore "just because". [there was a time when they did, because they thought is was "cool"...but rarely anymore].

Ernie

Posted: Thu Mar 20, 2008 1:28 pm
by jatayl
I was approached to write a couple of pilot jobs for different input sources to load a table, one sequential and the other xml. I found that if there are many sources, it was more beneficial to use the flat file versus an xml file, becuase I could identify the table/schema file, and write one job to load the multiple files into their respective tables using RCP. With xml rcp was not an option. I had to use a stylesheet or define all possible output columns to the table, and then tell the table to drop columns not needed.

Bottom line. I wrote the jobs both ways, but advised that the incoming source be a flat file. My preference, flat file.

Just my $0.02.