Page 1 of 1

XML Parsing Query

Posted: Wed Aug 17, 2016 1:56 am
by SachinCho
Hi All,
I have been trying to parse an xml using a parallel job with following job design. We are v9.1

row gen (holds value of seq file having only single column which is xml) ---> xmlimput ----> seq file

I am trying to parse only single tag (newParty) from xml for which xsd looks like as below

<xs:element name='newParty'>
<xs:complexType>
<xs:attribute name='eventId' use='required'/>
<xs:attribute name='timeShift' use='required'/>
<xs:attribute name='userId' use='required'/>
<xs:attribute name='visibility' use='required'>
<xs:simpleType>
<xs:restriction base='xs:string'>
<xs:enumeration value='ALL'/>
<xs:enumeration value='INT'/>
<xs:enumeration value='VIP'/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
<xs:sequence>
<xs:element ref='userInfo' minOccurs='0'/>
<xs:element ref='userData' minOccurs='0'/>
</xs:sequence>
</xs:complexType>
</xs:element>

User data again has following declaration in same xsd

<xs:element name='userData'>
<xs:complexType>
<xs:sequence>
<xs:element ref='item' minOccurs='0' maxOccurs='unbounded'/>
</xs:sequence>
</xs:complexType>
</xs:element>


Sample xml is as follow

- <newParty userId="9032738273CDF" eventId="1" timeShift="0" visibility="ALL">
<userInfo personId="" userNick="abc" userType="CLIENT" protocolType="FLEX" timeZoneOffset="720" />
- <userData>
<item key="ChatID">Reactive</item>
<item key="ChatURL">contactuschat</item>
<item key="EmailAddress">abc@gmail.com</item>
<item key="FirstName">abc</item>
<item key="FromAddress">abc@gmail.com</item>
<item key="IdentifyCreateContact">3</item>
<item key="MediaType">chat</item>
<item key="MessageCount">Agent:0|Customer:0</item>
<item key="PhoneNumber">123456789</item>
<item key="Question">qyery1</item>
<item key="Subject">qyery1</item>
<item key="TopicID">topic1</item>
</userData>
</newParty>

We are using eventId as repetition element key as this xml field has multiple events. Trouble is parsing user data, where with standard defined schema we are able to retrieve only upto <item key="ChatID">Reactive</item>. Sub-sequent data is not getting captured and I am out of ideas at the moment.

This is how schema is defined in Datastage

/chatTranscript/@startAt
/chatTranscript/@sessionId
/chatTranscript/@savedPosition
/chatTranscript/newParty/@userId
/chatTranscript/newParty/@eventId
/chatTranscript/newParty/@timeShift
/chatTranscript/newParty/@visibility
/chatTranscript/newParty/userInfo/@personId
/chatTranscript/newParty/userInfo/@userNick
/chatTranscript/newParty/userInfo/@userType
/chatTranscript/newParty/userInfo/@protocolType
/chatTranscript/newParty/userInfo/@timeZoneOffset
/chatTranscript/newParty/userData/item/@key
/chatTranscript/newParty/userData/item/text()


So last part item@key and item/text() returns only first value from userData


Any pointers would be much appreciated

Thanks,
Sach

Posted: Wed Aug 17, 2016 8:33 pm
by eostic
In this xml, "item" must be the key or "repetition element". It is the only element that repeats. You will get as many rows as you have items. You should be able to get the user data....same for each item row, provided it only occurs once for each group of items.

Ernie

Posted: Thu Aug 18, 2016 4:31 am
by SachinCho
Thanks Ernie ! Got the point. I am able to parse this one now. I was using eventid as "repetition key" as I have multiple events within custmoer chat and within events again multiple items are there. But I guess I will have to use item as key in some case and event in some case. Exploring more.

Posted: Thu Aug 18, 2016 12:38 pm
by eostic
Indeed...each "independent" repeating node path needs its own output link and then "repetition element". Nested is ok, but separate links when the nodes are unrelated (such as "employees" under "company" vs "assets" also under "company").

Ernie