XML Input Stage File Size Constraint

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
dav_mcnair
Premium Member
Premium Member
Posts: 35
Joined: Thu Apr 19, 2007 12:42 pm

XML Input Stage File Size Constraint

Post by dav_mcnair »

Does anyone know if their is a size constraint for the XML Input stage? I have read several conflicting posts and want to get some opinions. I am trying to make a strategic direction to use XML vs. Flat file and really need some input.

Thanks

David
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

What kind of sizes would you be dealing with? There are definite size limits when using the traditional 'two fields, all content at once' methodolgy with the Folder stage. Cutting that back to just one field to pass the filename and letting the XML Input file read the file directly pretty much removes that limit.

Problem with the size limit is it seems to vary from install to install, O/S to O/S, etc. My limit was around 50MB from what I recall. And I say 'pretty much' on the second methodology as I've been able to parse XML files in the 500MB range but I've read about others having issues with GB sized files. To me, XML files that big are just insane, but sometimes we have to live in an insane world. :wink:

Hope that helps.
-craig

"You can never have too many knives" -- Logan Nine Fingers
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

...exactly....and be careful if you go to EE. Server is much more flexible and forgiving in terms of handling variable length text.

Otherwise you'll probably run into OS limits (the whole document for XMLInput has to be read into memory before anything happens) long before you hit DS limits.

Ernie
mystuff
Premium Member
Premium Member
Posts: 200
Joined: Wed Apr 11, 2007 2:06 pm

Post by mystuff »

chulett : What kind of sizes would you be dealing with? There are definite size limits when using the traditional 'two fields, all content at once' methodolgy with the Folder stage. Cutting that back to just one field to pass the filename and letting the XML Input file read the file directly pretty much removes that limit.
How would this affect on size limit, i mean why would using traditional 'two fields, all content at once' methodology with the folder stage' impose file limit, where as other doesnt.
Problem with the size limit is it seems to vary from install to install, O/S to O/S, etc. My limit was around 50MB from what I recall. And I say 'pretty much' on the second methodology as I've been able to parse XML files in the 500MB range but I've read about others having issues with GB sized files. To me, XML files that big are just insane, but sometimes we have to live in an insane world.


Do you recall any of those issues faced with GB sized files?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

The 'two field' way the Folder stage brings the entire XML file into memory and then passes it as one record / field to the XML Input stage. In the other, the XML stage does its own dirty work. It's better at it.

In all cases, the 'issues' are the same - your job falls over dead. Nothing more than that, bang abby-normal termination dead.
-craig

"You can never have too many knives" -- Logan Nine Fingers
mystuff
Premium Member
Premium Member
Posts: 200
Joined: Wed Apr 11, 2007 2:06 pm

Post by mystuff »

In all cases, the 'issues' are the same - your job falls over dead. Nothing more than that, bang abby-normal termination dead.
Why does this happen in case of XML stage and not sequential stage, as I use about 400GB size of txt files with sequential stage.
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Because with the Sequential stage you are reading multiple "rows" in the file.... ie.. read a row (till CRLF, for example) and then send that down the link........

...with the folder stage, it's going to "read the whole file as a single row/column" and try to send that down the link.

Presumably, with your flat file, it has many rows, each far smaller than 400G. With the Folder stage, that's it --- one row, one column actually, per whole file.

Ernie
mystuff
Premium Member
Premium Member
Posts: 200
Joined: Wed Apr 11, 2007 2:06 pm

Post by mystuff »

your job falls over dead. Nothing more than that, bang abby-normal termination dead.
a) When such kind of error occurs, is there a way to deal with it.
b) If complete XML file (data) can't be accomodated in physical memory, then doesn't it access with hard disk through mechanism of paging. The time of response would be more. But I guess the job shouldnt be dead for that reason, isnt it?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

a) No, not really, dead is dead and it just becomes another 'abnormal termination' you need to deal with.

b) I suppose, but I'm not privy to the internal workings of the stage.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply