XML Parsing Query
Posted: Wed Aug 17, 2016 1:56 am
Hi All,
I have been trying to parse an xml using a parallel job with following job design. We are v9.1
row gen (holds value of seq file having only single column which is xml) ---> xmlimput ----> seq file
I am trying to parse only single tag (newParty) from xml for which xsd looks like as below
<xs:element name='newParty'>
<xs:complexType>
<xs:attribute name='eventId' use='required'/>
<xs:attribute name='timeShift' use='required'/>
<xs:attribute name='userId' use='required'/>
<xs:attribute name='visibility' use='required'>
<xs:simpleType>
<xs:restriction base='xs:string'>
<xs:enumeration value='ALL'/>
<xs:enumeration value='INT'/>
<xs:enumeration value='VIP'/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
<xs:sequence>
<xs:element ref='userInfo' minOccurs='0'/>
<xs:element ref='userData' minOccurs='0'/>
</xs:sequence>
</xs:complexType>
</xs:element>
User data again has following declaration in same xsd
<xs:element name='userData'>
<xs:complexType>
<xs:sequence>
<xs:element ref='item' minOccurs='0' maxOccurs='unbounded'/>
</xs:sequence>
</xs:complexType>
</xs:element>
Sample xml is as follow
- <newParty userId="9032738273CDF" eventId="1" timeShift="0" visibility="ALL">
<userInfo personId="" userNick="abc" userType="CLIENT" protocolType="FLEX" timeZoneOffset="720" />
- <userData>
<item key="ChatID">Reactive</item>
<item key="ChatURL">contactuschat</item>
<item key="EmailAddress">abc@gmail.com</item>
<item key="FirstName">abc</item>
<item key="FromAddress">abc@gmail.com</item>
<item key="IdentifyCreateContact">3</item>
<item key="MediaType">chat</item>
<item key="MessageCount">Agent:0|Customer:0</item>
<item key="PhoneNumber">123456789</item>
<item key="Question">qyery1</item>
<item key="Subject">qyery1</item>
<item key="TopicID">topic1</item>
</userData>
</newParty>
We are using eventId as repetition element key as this xml field has multiple events. Trouble is parsing user data, where with standard defined schema we are able to retrieve only upto <item key="ChatID">Reactive</item>. Sub-sequent data is not getting captured and I am out of ideas at the moment.
This is how schema is defined in Datastage
/chatTranscript/@startAt
/chatTranscript/@sessionId
/chatTranscript/@savedPosition
/chatTranscript/newParty/@userId
/chatTranscript/newParty/@eventId
/chatTranscript/newParty/@timeShift
/chatTranscript/newParty/@visibility
/chatTranscript/newParty/userInfo/@personId
/chatTranscript/newParty/userInfo/@userNick
/chatTranscript/newParty/userInfo/@userType
/chatTranscript/newParty/userInfo/@protocolType
/chatTranscript/newParty/userInfo/@timeZoneOffset
/chatTranscript/newParty/userData/item/@key
/chatTranscript/newParty/userData/item/text()
So last part item@key and item/text() returns only first value from userData
Any pointers would be much appreciated
Thanks,
Sach
I have been trying to parse an xml using a parallel job with following job design. We are v9.1
row gen (holds value of seq file having only single column which is xml) ---> xmlimput ----> seq file
I am trying to parse only single tag (newParty) from xml for which xsd looks like as below
<xs:element name='newParty'>
<xs:complexType>
<xs:attribute name='eventId' use='required'/>
<xs:attribute name='timeShift' use='required'/>
<xs:attribute name='userId' use='required'/>
<xs:attribute name='visibility' use='required'>
<xs:simpleType>
<xs:restriction base='xs:string'>
<xs:enumeration value='ALL'/>
<xs:enumeration value='INT'/>
<xs:enumeration value='VIP'/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
<xs:sequence>
<xs:element ref='userInfo' minOccurs='0'/>
<xs:element ref='userData' minOccurs='0'/>
</xs:sequence>
</xs:complexType>
</xs:element>
User data again has following declaration in same xsd
<xs:element name='userData'>
<xs:complexType>
<xs:sequence>
<xs:element ref='item' minOccurs='0' maxOccurs='unbounded'/>
</xs:sequence>
</xs:complexType>
</xs:element>
Sample xml is as follow
- <newParty userId="9032738273CDF" eventId="1" timeShift="0" visibility="ALL">
<userInfo personId="" userNick="abc" userType="CLIENT" protocolType="FLEX" timeZoneOffset="720" />
- <userData>
<item key="ChatID">Reactive</item>
<item key="ChatURL">contactuschat</item>
<item key="EmailAddress">abc@gmail.com</item>
<item key="FirstName">abc</item>
<item key="FromAddress">abc@gmail.com</item>
<item key="IdentifyCreateContact">3</item>
<item key="MediaType">chat</item>
<item key="MessageCount">Agent:0|Customer:0</item>
<item key="PhoneNumber">123456789</item>
<item key="Question">qyery1</item>
<item key="Subject">qyery1</item>
<item key="TopicID">topic1</item>
</userData>
</newParty>
We are using eventId as repetition element key as this xml field has multiple events. Trouble is parsing user data, where with standard defined schema we are able to retrieve only upto <item key="ChatID">Reactive</item>. Sub-sequent data is not getting captured and I am out of ideas at the moment.
This is how schema is defined in Datastage
/chatTranscript/@startAt
/chatTranscript/@sessionId
/chatTranscript/@savedPosition
/chatTranscript/newParty/@userId
/chatTranscript/newParty/@eventId
/chatTranscript/newParty/@timeShift
/chatTranscript/newParty/@visibility
/chatTranscript/newParty/userInfo/@personId
/chatTranscript/newParty/userInfo/@userNick
/chatTranscript/newParty/userInfo/@userType
/chatTranscript/newParty/userInfo/@protocolType
/chatTranscript/newParty/userInfo/@timeZoneOffset
/chatTranscript/newParty/userData/item/@key
/chatTranscript/newParty/userData/item/text()
So last part item@key and item/text() returns only first value from userData
Any pointers would be much appreciated
Thanks,
Sach