Large Xml Files
Moderators: chulett, rschirm, roy
-
- Premium Member
- Posts: 27
- Joined: Wed Apr 11, 2007 12:53 am
Large Xml Files
Hi,
We are facing problems reading large xml files(250 mbs). It gives following error
PLU_XML..XML_PLU: XML input document value is empty or NULL. Column Name = "Record"
I have tried both 2 parameter and single parameter(url path) approach.
Also the reading of xml is pretty slow.
1. Are there any other methods for better reading of xmls?
2. How to improve performance while xml reading?
3. Can anyone give me names of xml to flat file converter utilities?
I have read in many posts that large XMLs are insane, but this is what we are getting from the source and we need to process them at our end.
We are facing problems reading large xml files(250 mbs). It gives following error
PLU_XML..XML_PLU: XML input document value is empty or NULL. Column Name = "Record"
I have tried both 2 parameter and single parameter(url path) approach.
Also the reading of xml is pretty slow.
1. Are there any other methods for better reading of xmls?
2. How to improve performance while xml reading?
3. Can anyone give me names of xml to flat file converter utilities?
I have read in many posts that large XMLs are insane, but this is what we are getting from the source and we need to process them at our end.
I don't know how anyone can make a blanket statement about 200MBs being a limit for DataStage, there's way too many variables involved. I've succesfully processed files nearly twice that on my system.
Plus, whenever I've had 'size' problems they aren't nearly so nice - the job just falls over dead. Is this one large file that gives this message? Many 'large' files? If it is just one, I'd wonder if the problem lies in the file itself and not its size.
And not everyone can 'write a simple program' to parse XML... what would you suggest to do that? Seeing as how some companies make a living delivering XML tools, not really sure how 'simple' it would really be. [shrug]
Plus, whenever I've had 'size' problems they aren't nearly so nice - the job just falls over dead. Is this one large file that gives this message? Many 'large' files? If it is just one, I'd wonder if the problem lies in the file itself and not its size.
And not everyone can 'write a simple program' to parse XML... what would you suggest to do that? Seeing as how some companies make a living delivering XML tools, not really sure how 'simple' it would really be. [shrug]
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Premium Member
- Posts: 27
- Joined: Wed Apr 11, 2007 12:53 am
--The XML are well formed for sure.ray.wurlod wrote:Back to basics, can you guarantee that the XML is well-formed?
--I am able to process 100 Mbs of xml file but anything more than that fails. The tag which we are trying to read has 69 attributes.
It gives error followinf error message
"Abnormal termination of stage Xml_test..XML_Input_1 detected"
The box is a 4 cpu machine with 16GB physical mem. Its a sun4u sparc SUNW,Sun-Fire-V440. Os-SunOS 5.10
While testing I see to it that nothing else is running on the box.
Can we anywayz avoid using Folder stage while using XML-input stage?
If you have used the Folder stage with one column passing only the filename and then set the XML Input stage to 'URL/Filepath' the issue is not with the Folder stage.
Post your question to your Support provider. I think you'll find the issue is your O/S which has... quirks... with XML, including fun things like the 'square root of a negative number' problem, which from what I recall is unique to Sun.
Post your question to your Support provider. I think you'll find the issue is your O/S which has... quirks... with XML, including fun things like the 'square root of a negative number' problem, which from what I recall is unique to Sun.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
a) Can you give me an idea about those variables involved.. All I know is size of the file .... which probably depends on physical memory available.I don't know how anyone can make a blanket statement about 200MBs being a limit for DataStage, there's way too many variables involved.
b) Could there be a rough estimated depending on available physical memory? Like for 16GB preferably not more than 200MB of each XML file.. something like that?
I haven't seen any good ways to calculate it, unfortunately. Perhaps it's something that could be dug up on the web (apache C++ xerces and xalan is being implemented deep inside the Stage). I suspect it's probably wildly variable depending on hierarchy, data types, element and attribute name lengths, values, etc.
Ernie
Ernie
-
- Premium Member
- Posts: 27
- Joined: Wed Apr 11, 2007 12:53 am
Hi,
We have taken a directional decision in not using xml-input plugin. We are trying to flatten XMLs using perl program where we have also added some business rules.
Our testing on big XML files(more than 100mbs) was giving inconsistent result and the performance was also pathetic.
Anywayz guys thnks a lot for all the suggestions.
rgrds
We have taken a directional decision in not using xml-input plugin. We are trying to flatten XMLs using perl program where we have also added some business rules.
Our testing on big XML files(more than 100mbs) was giving inconsistent result and the performance was also pathetic.
Anywayz guys thnks a lot for all the suggestions.
rgrds