Page 1 of 1

external xml parser

Posted: Tue Nov 21, 2006 9:37 am
by djoni
Has anybody used XML external parser, not the xalan parser supplied by Datastage?
djoni

Posted: Tue Nov 21, 2006 4:23 pm
by aartlett
Djoni,
I have used the Saxxon parser at two sites succesfully. It is very fast and reliable.

You need Java 1.5 to run it.

If you need the scripts I used to call it I'll post it, but only if you can't figure it out :)

Posted: Tue Nov 21, 2006 5:12 pm
by djoni
Thanks Andrew.
Can Saxxon handle large files, like 2GB?
djoni
aartlett wrote:Djoni,
I have used the Saxxon parser at two sites succesfully. It is very fast and reliable.

You need Java 1.5 to run it.

If you need the scripts I used to call it I'll post it, but only if you can't figure it out :)

Posted: Tue Nov 21, 2006 10:06 pm
by aartlett
djoni wrote: Can Saxxon handle large files, like 2GB?
It's a Java Program so I doubt it. It tends to blow stacks at about 0.5 - 1GB even with ridiculous stack mem sizes.

Why are you getting such hideous files. XML shouldn't be that big.

Posted: Tue Nov 21, 2006 10:36 pm
by chulett
aartlett wrote:Why are you getting such hideous files. XML shouldn't be that big.
I had the same first thought. There's no reason to be delivering files that big... if someone dropped something like that on my doorstep, I'd push back and insist they chunk it up into more digestable pieces. As in smaller pieces.

Heck, we deliver content to Google and they insisted no individual file be greater than 100MB. They would be fine with 2GB of total files, but... not all in one file, for Pete's sake.

Posted: Thu Nov 23, 2006 12:56 pm
by djoni
Well ..., I'm very fortunate to get this highly paid assignment. If my client already managed to break this huge XML files into smaller ones ... they would never come for my help
Thanks anyway for your thoghts
chulett wrote:
aartlett wrote:Why are you getting such hideous files. XML shouldn't be that big.
I had the same first thought. There's no reason to be delivering files that big... if someone dropped something like that on my doorstep, I'd push back and insist they chunk it up into more digestable pieces. As in smaller pieces.

Heck, we deliver content to Google and they insisted no individual file be greater than 100MB. They would be fine with 2GB of total files, but... not all in one file, for Pete's sake.

Posted: Thu Nov 23, 2006 12:58 pm
by djoni
Thanks again Andrew.
Do you happen to know any non-Java XML parser that's scalable?

aartlett wrote:
djoni wrote: Can Saxxon handle large files, like 2GB?
It's a Java Program so I doubt it. It tends to blow stacks at about 0.5 - 1GB even with ridiculous stack mem sizes.

Why are you getting such hideous files. XML shouldn't be that big.

Posted: Thu Nov 23, 2006 4:02 pm
by aartlett
Djoni,
Can't think of one off the top of my head, some Google research will probably find one though.

The problem with a lot of XML parsers is that they have to store the record in memory until they get to a close off, sort of like recursion. with the whole document being inside 1 tag structure, most will have problems. Saxxon just has the extra limitation of having to have it's stack predefined.

You can go for a Xalan/Xerces solution. This is similar to Data stage but can be done externally. there is a Java and a C version on the Apache web site (or there was a year or so ago). I tried those and they worked well, especially the C version didn't have the memory limitations that the Java version or Saxxon had. My problem was I couldn't get the Unix sysadmins to install the libraries I needed.