How to improve throughput or performance of XML parser job?
Posted: Mon Sep 30, 2013 10:30 am
Hi,
I have a parallel job that has the following stages:
External Source==> XML ==> Dataset
Number of nodes used= 8 nodes. (Tried 2 and 4 nodes also but 8 nodes resulted to fastest throughput).
Average XML size is approximately 500 KB.
Current performance/throughput=1,000 XMLs per minute.
Target performance (i.e. throughput)= 2,500 XMLs per minute or better.
Schema (xsd of input XML) used:
<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="Call">
<xs:complexType>
<xs:sequence>
<xs:element name="index">
<xs:complexType>
<xs:sequence>
<xs:element type="xs:dateTime" name="startTime"/>
<xs:element type="xs:string" name="callID"/>
<xs:element type="xs:string" name="appName"/>
<xs:element type="xs:string" name="appLanguage"/>
<xs:element type="xs:string" name="appRegion"/>
<xs:element type="xs:string" name="ivrName"/>
<xs:element type="xs:string" name="ivrPort"/>
<xs:element type="xs:string" name="codeRelease"/>
<xs:element type="xs:string" name="dataRelease"/>
<xs:element type="xs:dateTime" name="endTime"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="rptTag" maxOccurs="unbounded" minOccurs="0">
<xs:complexType>
<xs:sequence>
<xs:element type="xs:string" name="name"/>
<xs:element name="attrib" maxOccurs="unbounded" minOccurs="0">
<xs:complexType>
<xs:sequence>
<xs:element type="xs:string" name="name"/>
<xs:element type="xs:string" name="value"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
Also, I tried to enforce parallelism per IBM's documentation found in http://pic.dhe.ibm.com/infocenter/iisin ... rsing.html but to no success.
In addition I also need every element (i.e. startTime, CallID, etc, etc in my output as well, so not sure if XML parser parallelism will help me even if I'm able to get it to work.
Any tip on how to improve performance and meet the target throughput of 2,500 xmls per minute or better is greatly appreciated.
Thanks,
Edgar
I have a parallel job that has the following stages:
External Source==> XML ==> Dataset
Number of nodes used= 8 nodes. (Tried 2 and 4 nodes also but 8 nodes resulted to fastest throughput).
Average XML size is approximately 500 KB.
Current performance/throughput=1,000 XMLs per minute.
Target performance (i.e. throughput)= 2,500 XMLs per minute or better.
Schema (xsd of input XML) used:
<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="Call">
<xs:complexType>
<xs:sequence>
<xs:element name="index">
<xs:complexType>
<xs:sequence>
<xs:element type="xs:dateTime" name="startTime"/>
<xs:element type="xs:string" name="callID"/>
<xs:element type="xs:string" name="appName"/>
<xs:element type="xs:string" name="appLanguage"/>
<xs:element type="xs:string" name="appRegion"/>
<xs:element type="xs:string" name="ivrName"/>
<xs:element type="xs:string" name="ivrPort"/>
<xs:element type="xs:string" name="codeRelease"/>
<xs:element type="xs:string" name="dataRelease"/>
<xs:element type="xs:dateTime" name="endTime"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="rptTag" maxOccurs="unbounded" minOccurs="0">
<xs:complexType>
<xs:sequence>
<xs:element type="xs:string" name="name"/>
<xs:element name="attrib" maxOccurs="unbounded" minOccurs="0">
<xs:complexType>
<xs:sequence>
<xs:element type="xs:string" name="name"/>
<xs:element type="xs:string" name="value"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
Also, I tried to enforce parallelism per IBM's documentation found in http://pic.dhe.ibm.com/infocenter/iisin ... rsing.html but to no success.
In addition I also need every element (i.e. startTime, CallID, etc, etc in my output as well, so not sure if XML parser parallelism will help me even if I'm able to get it to work.
Any tip on how to improve performance and meet the target throughput of 2,500 xmls per minute or better is greatly appreciated.
Thanks,
Edgar