XML Parsing using XML Stage

ggarze · Post by **ggarze** » Wed Dec 18, 2013 12:09 pm

If I have the below XML that I need to parse and bring data up to the PONUMBER am I correct in saying that to do this using the XML Stage I have to have 2 separate links coming out of the stage where the one link would extract the <ReferenceInfo> data and the other link would extract the <MilestoneInfo> data because they are both at the same level under <LineItems><MilestoneMessage>? Then after I have the 2 separate outputs I can join them on the PONumber which is the parent of both so I now can get a record where the PONUmber and BLNumber will repeat for each MileStone I have like the output I show after the xml? Am I missing something where I don't have to have 2 output links to later join or can this be done in the XML Stage? Because this is just a sample. The real XML has a host of tags imbedded in it where many are at the same level as others and I'm thinking I have to have a bunch of files coming out of the stage to capture each one.

Code: Select all

- <LineItems>
  <PONumber>643730</PONumber> 
   - <MilestoneMessage>
    - <ReferenceInfo>
         <ReferenceId reference="3080951960" type="BLNumber" /> 
      </ReferenceInfo>
    - <MilestoneInfo>
         <MilestoneTypeCode>105</MilestoneTypeCode> 
         <MilestoneTypeName>PO Accepted</MilestoneTypeName> 
         - <City>
             <CityName>Jiangsiu</CityName> 
           </City>
      </MilestoneInfo>
    - <MilestoneInfo>
         <MilestoneTypeCode>11144</MilestoneTypeCode> 
         <MilestoneTypeName>Supplier Booking Accepted</MilestoneTypeName> 
         - <City>
             <CityName>Shanghai</CityName> 
           </City>
      </MilestoneInfo>
     </MilestoneMessage>
  </LineItems>

LineItems\MilestoneMessage\ReferenceInfo
LineItems\MilestoneMessage\MilestoneInfo\City

Output after join
PONUmber,BLNumber,MilestoneTypeCode,MilestoneTypeName,CityName
643730,3080951960,105,PO Accepted,Jiangsiu
643730,3080951960,11144,Supplier Booking Accepted,Shanghai

eostic · Post by **eostic** » Wed Dec 18, 2013 3:45 pm

We can't be entirely sure from this content, but one could expect that the "reference" node occurs only once for each PO, and the message detail occurs many times. In that case, you can absolutely have everything on one single link.

You need a separate link for each "independently occurring" path of nodes and sub-nodes.

Imagine for your PO if you had a sub-node of line items, and for a given PO, there might be (example), 87 line items.

...and for that same PO, you have a sub-node for multiple addresses...and for that same PO, the 87 line items are going to be sent to (example) 7 different locations.

That's when you need multiple output links --- just as you would have two separate retrievals if these were related rdbms tables. Seven rows with PO number for the addresses, and 87 rows for the items ordered (with the PO number).

Then do whatever you need to downstream.

There are alternative things you can do with Pivoting and such, depending on the requirement and the variability (or not) of each set of occurrences, but when the nodes are purely variable in occurrence, this is typical --- and makes perfect sense when the target (90% of the time) is an rdbms, which is probably modeled in similar fashion, with a table for addresses and another for line items. This is as true for the xml stage as it is for xmlInput and just about any other technology that is used to parse xml and load it into a target set of relational tables.

Ernie