external xml parser

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
djoni
Participant
Posts: 98
Joined: Wed Oct 05, 2005 1:01 pm

external xml parser

Post by djoni »

Has anybody used XML external parser, not the xalan parser supplied by Datastage?
djoni
aartlett
Charter Member
Charter Member
Posts: 152
Joined: Fri Apr 23, 2004 6:44 pm
Location: Australia

Post by aartlett »

Djoni,
I have used the Saxxon parser at two sites succesfully. It is very fast and reliable.

You need Java 1.5 to run it.

If you need the scripts I used to call it I'll post it, but only if you can't figure it out :)
Andrew

Think outside the Datastage you work in.

There is no True Way, but there are true ways.
djoni
Participant
Posts: 98
Joined: Wed Oct 05, 2005 1:01 pm

Post by djoni »

Thanks Andrew.
Can Saxxon handle large files, like 2GB?
djoni
aartlett wrote:Djoni,
I have used the Saxxon parser at two sites succesfully. It is very fast and reliable.

You need Java 1.5 to run it.

If you need the scripts I used to call it I'll post it, but only if you can't figure it out :)
aartlett
Charter Member
Charter Member
Posts: 152
Joined: Fri Apr 23, 2004 6:44 pm
Location: Australia

Post by aartlett »

djoni wrote: Can Saxxon handle large files, like 2GB?
It's a Java Program so I doubt it. It tends to blow stacks at about 0.5 - 1GB even with ridiculous stack mem sizes.

Why are you getting such hideous files. XML shouldn't be that big.
Andrew

Think outside the Datastage you work in.

There is no True Way, but there are true ways.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

aartlett wrote:Why are you getting such hideous files. XML shouldn't be that big.
I had the same first thought. There's no reason to be delivering files that big... if someone dropped something like that on my doorstep, I'd push back and insist they chunk it up into more digestable pieces. As in smaller pieces.

Heck, we deliver content to Google and they insisted no individual file be greater than 100MB. They would be fine with 2GB of total files, but... not all in one file, for Pete's sake.
-craig

"You can never have too many knives" -- Logan Nine Fingers
djoni
Participant
Posts: 98
Joined: Wed Oct 05, 2005 1:01 pm

Post by djoni »

Well ..., I'm very fortunate to get this highly paid assignment. If my client already managed to break this huge XML files into smaller ones ... they would never come for my help
Thanks anyway for your thoghts
chulett wrote:
aartlett wrote:Why are you getting such hideous files. XML shouldn't be that big.
I had the same first thought. There's no reason to be delivering files that big... if someone dropped something like that on my doorstep, I'd push back and insist they chunk it up into more digestable pieces. As in smaller pieces.

Heck, we deliver content to Google and they insisted no individual file be greater than 100MB. They would be fine with 2GB of total files, but... not all in one file, for Pete's sake.
djoni
Participant
Posts: 98
Joined: Wed Oct 05, 2005 1:01 pm

Post by djoni »

Thanks again Andrew.
Do you happen to know any non-Java XML parser that's scalable?

aartlett wrote:
djoni wrote: Can Saxxon handle large files, like 2GB?
It's a Java Program so I doubt it. It tends to blow stacks at about 0.5 - 1GB even with ridiculous stack mem sizes.

Why are you getting such hideous files. XML shouldn't be that big.
aartlett
Charter Member
Charter Member
Posts: 152
Joined: Fri Apr 23, 2004 6:44 pm
Location: Australia

Post by aartlett »

Djoni,
Can't think of one off the top of my head, some Google research will probably find one though.

The problem with a lot of XML parsers is that they have to store the record in memory until they get to a close off, sort of like recursion. with the whole document being inside 1 tag structure, most will have problems. Saxxon just has the extra limitation of having to have it's stack predefined.

You can go for a Xalan/Xerces solution. This is similar to Data stage but can be done externally. there is a Java and a C version on the Apache web site (or there was a year or so ago). I tried those and they worked well, especially the C version didn't have the memory limitations that the Java version or Saxxon had. My problem was I couldn't get the Unix sysadmins to install the libraries I needed.
Andrew

Think outside the Datastage you work in.

There is no True Way, but there are true ways.
Post Reply