XML vs. Flat Files

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
rkumar28
Participant
Posts: 43
Joined: Tue Mar 30, 2004 9:39 am

XML vs. Flat Files

Post by rkumar28 »

Hi,

I wanted a recommendation related to a Flat File vs. XML. My company is debating on using an XML vs. flat file as a SOURCE in data stage. The data size will be around 300,000 rows.

I need recommendation on pros and cons of using XML over Flat files. Is using XML is beneficial than flat file in data stage.

Thanks for any advice and time.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Flat file is fastest, if for no other reason than that it's easier to parse the data from it based on metadata definitions. You only need to process flat file metadata once (assuming the format/content doesn't change, which is not always the case!). With XML there's the overhead of processing the tags and verifying them, as well as processing the data. For the small added safety it offers in getting things right, I believe that the cost of using XML over text files is unwarranted. If you have the choice, go for flat files every time! Of course, if you don't have the choice, and your data are being delivered in XML format, then DataStage can handle that, too.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
rkumar28
Participant
Posts: 43
Joined: Tue Mar 30, 2004 9:39 am

Post by rkumar28 »

Do we have to write a parser for the XML file in data stage or data stage can handle XML read and write all by itself.

Thanks

ray.wurlod wrote:Flat file is fastest, if for no other reason than that it's easier to parse the data from it based on metadata definitions. You only need to process flat file metadata once (assuming the format/content doesn't change, which is not always the case!). With XML there's the overhead of processing the tags and verifying them, as well as processing the data. For the small added safety it offers in getting things right, I believe that the cost of using XML over text files is unwarranted. If you have the choice, go for flat files every time! Of course, if you don't have the choice, and your data are being delivered in XML format, then DataStage can handle that, too.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

DataStage version 7.x has XML reader and XML writer stage types.

Consult on-line help or the relevant manual (these are in your DataStage client install folder, in the Docs sub-folder). The relevant manual in this case is XML PACK Designer Guide (XMLPACK_20_Designer.pdf).

In response to your private message, technically XML files can be transmitted by FTP. They are, after all, still pure text (the tags as well as the data are text). Politically is an entirely different question!
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Ray has already pretty much covered this, just wanted to throw another log of opinion onto the fire. Warning, this is all "IMHO" stuff coming. :wink:

I know that XML is all the rage among some circles and I've had people shake their heads and mutter about me being a Luddite for not choosing to use XML at every opportunity. That's the key word to me - choice. If you've got one, stick with flat files. When you need to process XML, either as a Source or Target, then go for it. Study the XML docs and make sure you understand the limitations of the different approaches. Otherwise, stick with something that's generally smaller, more flexible and faster to process - the dreaded sequential file.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

:idea:

Ultimately, performance is mainly about the work you can avoid doing.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
asadi
Participant
Posts: 10
Joined: Sun Nov 02, 2003 9:18 pm

Post by asadi »

I know you guys have already touched on performance, but l would just reiterate that the XML parser in DataStage is very slow. As Ray said stick to flat files.
Post Reply