we are trying to process XML data (present in text column in sql server) using Datastage 11.3 (windows), Hierarchical stage. Objective is to convert XMLs into columnar data. There is another columnn in source table that tells us what is source of this data (XML type). As of now we have not got a XSD for each XML type, hence we are generating XSD using XMLspy and using it. Current jobs are having 3 stages i.e.
ODBC_connector --> Hierarchical_stage (input, parset and output step) --> Netezza Connector
Problem:
For below mentioned xml type, we can see multiple xmls are stacked one above another. so effectively they are made up multiple concatenated xmls (All having same definations). Is there a way to split XMLs on fly before it reaches Hierarchical_stage. As volumes are huge, we are looking to process without landing data
Sample XML below:
...... signifies, there are more tags in XML
------ signifies, there are multiple xmls. overall there are 6 to 7 of them. Numbers vary.
Code: Select all
<?xml version="1.0"?>
<SchemeResult Ref="COPC2" Completed="Y" ErrorCount="0">
<PolData Type="Output">
<Vehicle>
<Vehicle_VehiclePrn Val="1.0"/>
<Ncd>
<Ncd_GrantedEntitlementReason Val="11"/>
<Ncd_GrantedPct Val="5.0"/>
<Ncd_GrantedYears Val="9.0"/>
</Ncd>
</Vehicle>
<Cover>
<Cover_VolXsAllowed Val="150.0"/>
<Cover_VehPrn Val="1.0"/>
</Cover>
.......
</PolData>
</SchemeResult><?xml version="1.0"?>
<SchemeResult Ref="ADPC2" Completed="Y" ErrorCount="0">
<PolData Type="Output">
<Vehicle>
<Vehicle_VehiclePrn Val="1.0"/>
<Vehicle_Count Val="13.0"/>
<Ncd>
<Ncd_GrantedEntitlementReason Val="11"/>
<Ncd_GrantedPct Val="1.0"/>
<Ncd_GrantedYears Val="9.0"/>
</Ncd>
</Vehicle>
<Cover>
<Cover_VolXsAllowed Val="150.0"/>
<Cover_VehPrn Val="1.0"/>
</Cover>
........
</PolData>
</SchemeResult>
--------