Page 1 of 1

XMLInput Stage Performance

Posted: Tue Apr 22, 2008 9:44 am
by pavankvk
Hi,

I am using XMLInput stage to process XML files. The meta data is Huge around 1000 columns. The throughput is very bad, i get around 30 rows/sec for around 150k XML files. I am using PX 7.5.2. Other jobs are giving a good throughput.

Is there any specific tuning for this xml stage that needs to be done?

tia

Posted: Tue Apr 22, 2008 4:05 pm
by ray.wurlod
Losing 90% of the columns would be favourite.

Posted: Tue Apr 22, 2008 5:32 pm
by eostic
XML in general is not speedy...and although under the covers DataStage is using the C++ version of the apache xerces and xalan parser/processor, it still has to load up the xml document into memory. That may be where a lot of the time is being spent. The 1000 columns aren't helping either. Here are some things to consider working on...let us know how some of these play out.....

a. parallelize your input to the XMLInput Stage. Assuming you have a decent multi-cpu machine, and can set up a config with four or more nodes, sequeuntially pick uo your list of filenames and then fan them out to multiple xml input stages.

b. read only chunks of xml in an initial XMLInput Stage, and send these to subsequent XMLInput Stages. Parse a little bit each time, in multiple processes....separating the work into smaller and smaller parts. You can do this by simply having one column on each link, with just xpath for the higher level node (before going all the way down to the text() syntax).

Ernie