XMLInput Stage Performance

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
pavankvk
Participant
Posts: 202
Joined: Thu Dec 04, 2003 7:54 am

XMLInput Stage Performance

Post by pavankvk »

Hi,

I am using XMLInput stage to process XML files. The meta data is Huge around 1000 columns. The throughput is very bad, i get around 30 rows/sec for around 150k XML files. I am using PX 7.5.2. Other jobs are giving a good throughput.

Is there any specific tuning for this xml stage that needs to be done?

tia
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Losing 90% of the columns would be favourite.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

XML in general is not speedy...and although under the covers DataStage is using the C++ version of the apache xerces and xalan parser/processor, it still has to load up the xml document into memory. That may be where a lot of the time is being spent. The 1000 columns aren't helping either. Here are some things to consider working on...let us know how some of these play out.....

a. parallelize your input to the XMLInput Stage. Assuming you have a decent multi-cpu machine, and can set up a config with four or more nodes, sequeuntially pick uo your list of filenames and then fan them out to multiple xml input stages.

b. read only chunks of xml in an initial XMLInput Stage, and send these to subsequent XMLInput Stages. Parse a little bit each time, in multiple processes....separating the work into smaller and smaller parts. You can do this by simply having one column on each link, with just xpath for the higher level node (before going all the way down to the text() syntax).

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
Post Reply