XML Input Issue

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
wesd
Participant
Posts: 22
Joined: Mon Aug 16, 2004 8:56 pm

XML Input Issue

Post by wesd »

Hello,

I'm having an issue with XML Input. I've read the XML Pack documents but an not an expert at XML files. My question is with the repetition key.

I used XML importer to define the data file.

The XML file I'm reading looks like this (I've cut out the first record, but there are 12 records per XML file).

<?xml version="1.0" encoding="UTF-8" ?>
- <FormattedReport xmlns="urn:crystal-reports:schemas" xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance">
- <FormattedAreaPair Level="0" Type="Report">
- <FormattedArea Type="Header">
- <FormattedSections>
- <FormattedSection SectionNumber="0">
<FormattedReportObjects />
</FormattedSection>
</FormattedSections>
</FormattedArea>
- <FormattedAreaPair Level="1" Type="Group">
- <FormattedArea Type="Footer">
- <FormattedSections>
- <FormattedSection SectionNumber="0">
- <FormattedReportObjects>
- <FormattedReportObject xsi:type="CTFormattedField" Type="xsd:decimal" FieldName="{@MsgLock}">
<ObjectName>Field29</ObjectName>
<FormattedValue>0</FormattedValue>
<Value>0.00</Value>
</FormattedReportObject>
- <FormattedReportObject xsi:type="CTFormattedField" Type="xsd:decimal" FieldName="{@MsgAcDel}">
<ObjectName>Field24</ObjectName>
<FormattedValue>4139</FormattedValue>
<Value>4139.00</Value>
</FormattedReportObject>
- <FormattedReportObject xsi:type="CTFormattedField" Type="xsd:decimal" FieldName="{@MsgAcHeard}">
<ObjectName>Field22</ObjectName>
<FormattedValue>4138</FormattedValue>
<Value>4138.00</Value>
</FormattedReportObject>
- <FormattedReportObject xsi:type="CTFormattedField" Type="xsd:decimal" FieldName="{@MsgAcAvgMin}">
<ObjectName>Field21</ObjectName>
<FormattedValue>0</FormattedValue>
<Value>0.00</Value>
</FormattedReportObject>
- <FormattedReportObject xsi:type="CTFormattedField" Type="xsd:decimal" FieldName="{@MsgAcTime}">
<ObjectName>Field20</ObjectName>
<FormattedValue>0</FormattedValue>
<Value>0.00</Value>
</FormattedReportObject>
- <FormattedReportObject xsi:type="CTFormattedField" Type="xsd:decimal" FieldName="{@MsgAcCnt}">
<ObjectName>Field19</ObjectName>
<FormattedValue>1</FormattedValue>
<Value>1.00</Value>
</FormattedReportObject>
- <FormattedReportObject xsi:type="CTFormattedField" Type="xsd:decimal" FieldName="{@MsgRvAvg}">
<ObjectName>Field13</ObjectName>
<FormattedValue>0.00</FormattedValue>
<Value>0.00</Value>
</FormattedReportObject>
- <FormattedReportObject xsi:type="CTFormattedField" Type="xsd:decimal" FieldName="{@MsgRvLen}">
<ObjectName>Field17</ObjectName>
<FormattedValue>0.00</FormattedValue>
<Value>0.00</Value>
</FormattedReportObject>
- <FormattedReportObject xsi:type="CTFormattedField" Type="xsd:decimal" FieldName="{@MsgRvCnt}">
<ObjectName>Field12</ObjectName>
<FormattedValue>0</FormattedValue>
<Value>0.00</Value>
</FormattedReportObject>
- <FormattedReportObject xsi:type="CTFormattedField" Type="xsd:decimal" FieldName="{@MsgMCnt}">
<ObjectName>Field9</ObjectName>
<FormattedValue>0</FormattedValue>
<Value>0.00</Value>
</FormattedReportObject>
- <FormattedReportObject xsi:type="CTFormattedField" Type="xsd:decimal" FieldName="{@MsgRAvg}">
<ObjectName>Field10</ObjectName>
<FormattedValue>0.20</FormattedValue>
<Value>0.20</Value>
</FormattedReportObject>
- <FormattedReportObject xsi:type="CTFormattedField" Type="xsd:decimal" FieldName="{@MsgRLen}">
<ObjectName>Field11</ObjectName>
<FormattedValue>0.50</FormattedValue>
<Value>0.50</Value>
</FormattedReportObject>
- <FormattedReportObject xsi:type="CTFormattedField" Type="xsd:decimal" FieldName="{@MsgRCnt}">
<ObjectName>Field6</ObjectName>
<FormattedValue>2</FormattedValue>
<Value>2.00</Value>
</FormattedReportObject>
- <FormattedReportObject xsi:type="CTFormattedField" Type="xsd:string" FieldName="{@MailBoxnameheader}">
<ObjectName>Field7</ObjectName>
<FormattedValue>111111111,. (1000)</FormattedValue>
<Value>111111111,. (1000)</Value>
</FormattedReportObject>
- <FormattedReportObject xsi:type="CTFormattedField" Type="xsd:date" FieldName="PrintDate">
<ObjectName>Field2</ObjectName>
<FormattedValue>Print Date: 2008-01-08</FormattedValue>
<Value>2008-01-08</Value>
</FormattedReportObject>
- <FormattedReportObject xsi:type="CTFormattedField" Type="xsd:string" FieldName="{@CompanyName}">
<ObjectName>Field3</ObjectName>
<FormattedValue>061</FormattedValue>
<Value>061</Value>
</FormattedReportObject>
</FormattedReportObjects>
</FormattedSection>
</FormattedSections>
</FormattedArea>
</FormattedAreaPair>


What I'm trying to do is pull out the Fields and place each record into a row with the fields as the columns. The way I have it set up now each field creates a new row.

Does anyone have a thought as to what I should use for the repetition key.

Thanks for your help.
Wes Dumey
Senior Consultant
Data Warehouse Projects
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Hi Wes...

Not exactly sure what you are looking for, but let's describe what the "normal" behavior should be, and then go from there.

From what I can see above, outside of what looks like some lead-in header type info, the document is primarily made up of "FormattedReportObjects"..... that element repeats, with some elements inside of it, such as FormattedValue. Assuming that FormattedReportObjects is the important information, the default behavior is that you should receive as many rows as there are FormattedReportObject elements.....each one getting its own row, and the columns being each of the sub-elements of that element. Choose any one of them as the key. I'd probably choose FormattedValue.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Hi Wes...

Not exactly sure what you are looking for, but let's describe what the "normal" behavior should be, and then go from there.

From what I can see above, outside of what looks like some lead-in header type info, the document is primarily made up of "FormattedReportObjects"..... that element repeats, with some elements inside of it, such as FormattedValue. Assuming that FormattedReportObjects is the important information, the default behavior is that you should receive as many rows as there are FormattedReportObject elements.....each one getting its own row, and the columns being each of the sub-elements of that element. Choose any one of them as the key. I'd probably choose FormattedValue.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
wesd
Participant
Posts: 22
Joined: Mon Aug 16, 2004 8:56 pm

Post by wesd »

Thanks Ernie. I've done what you described, but the issue is that each of the FormattedValue object is a column in a record, but is output as a separate row. Each record begins with

- <FormattedAreaPair Level="1" Type="Group">
*****columns here (FormattedValues)
</FormattedAreaPair>

So, I need to figure out how to get that element to repeat and populate like

FormattedValue1 FormattedValue2 FormattedValue3 ..... FormattedValueN
1 2 3 .... N

So far I have not been able to figure this out. Thoughts?

Wes
Wes Dumey
Senior Consultant
Data Warehouse Projects
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Post your XPath Expressions, please.
-craig

"You can never have too many knives" -- Logan Nine Fingers
wesd
Participant
Posts: 22
Joined: Mon Aug 16, 2004 8:56 pm

Here's the screen shot

Post by wesd »

Image
Wes Dumey
Senior Consultant
Data Warehouse Projects
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Hi Wes... you can't.

Not with the XMLInput Stage alone. I see this a lot when people choose the "easy way out" in designing their XML document. Instead of using meta data names to describe the elements, they have an attribute or other sub-element that defines the column name, and then still another for the value. One strength of XML is that the tags can be the true meta data names.

Enough said....it simply means that this document is set up to return multiple rows, one for each "column" in the desired structure.... Get them into multiple rows, and then look thru the forum to find your favorite "reverse pivot" technique.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Hi Wes... you can't.

Not with the XMLInput Stage alone. I see this a lot when people choose the "easy way out" in designing their XML document. Instead of using meta data names to describe the elements, they have an attribute or other sub-element that defines the column name, and then still another for the value. One strength of XML is that the tags can be the true meta data names.

Enough said....it simply means that this document is set up to return multiple rows, one for each "column" in the desired structure.... Get them into multiple rows, and then look thru the forum to find your favorite "reverse pivot" technique.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Specificall, this 'reverse pivot' to search for would be a vertical pivot. You'll have to do it manually as the Pivot stage only does a 'horizontal' pivot or columns to rows.
-craig

"You can never have too many knives" -- Logan Nine Fingers
wesd
Participant
Posts: 22
Joined: Mon Aug 16, 2004 8:56 pm

Post by wesd »

Thanks for your help guys. The client decided to import the files in CSV format. We'll explore (properly formatted) XML with another project.
Wes Dumey
Senior Consultant
Data Warehouse Projects
Post Reply