[edit]some cleaup[/edit]
san0907 wrote:Hi,
Below is the modified xml file and xpath expressions.Thanks.
<?xml version="1.0" encoding="utf-8"?><Report p1:schemaLocation="usrep
http://r/RS01?%2fExposure+Reports%2fusR ... chema=True" Name="usRep" textbox22="Valuation Date: May 19, 2009"><table1><Detail_Collection><table1><Detail_Collection><Detail mkt_val_eur="0" rating="N/A" issuer="N/A" master_issuer="N/A" /><Detail mkt_val_eur="222" rating="N/A" issuer="N/A" master_issuer="N/A" /></Detail_Collection></table1></Report>
/Report/table1/Detail_Collection/Detail/@mkt_val_eur
/Report/table1/Detail_Collection/Detail/@rating
/Report/table1/Detail_Collection/Detail/@issuer
/Report/table1/Detail_Collection/Detail/@master_issuer
This is timely for me - I'm building some xpath expresions for
an XML Input stage myself, and have been experiencing the joys
of XML and NameSpaces. Based on the example XML and the xpath
expressions in your last post, here are the summary points for you.
John G.
<free_advice_summary>
1) don't use broken xml.
2) use the
XML In Reject link
3) start with the simplest xpath possible
</free_advice_summary>
Note that I find the xml processing in Datastage to be crippled at best.
It is tedious and error prone to work with.
Be patient and take small steps; good luck!
Here are some details and observations for you...
1) don't use broken xml.
Here is what your XML looks like with a little human-friendly formatting.
I see that you have attribute
ReportTag.schemaLocation but...
it is actually
ReportTag.p1:schemaLocation.
I'm betting that DataStage is unhappy with the "p1:" namespace prefix.
It isn't enough to just have a xmlns:p1="
http://blah.blah.blah"
in DataStage's Namespace Declarations; it also has to be in the
actual XML Document.
Code: Select all
<?xml version="1.0" encoding="utf-8"?>
<Report
p1:schemaLocation="usrep http://r/RS01?%2fExposure+Reports%2fusRep&rs%3aCommand=Render&rs%3aFormat=XML&rs%3aSessionID=ytlmrhjiwad3twrggvws0h55&rc%3aSchema=True"
Name="usRep"
textbox22="Valuation Date: May 19, 2009">
<table1>
<Detail_Collection>
<table1>
<Detail_Collection>
<Detail mkt_val_eur="0" rating="N/A" issuer="N/A" master_issuer="N/A" />
<Detail mkt_val_eur="222" rating="N/A" issuer="N/A" master_issuer="N/A" />
</Detail_Collection>
</table1>
</Report>
What the heck, let's try it: I fired up a test job to run this through and
sure enough, the XMLIn stage was unhappy.
2) use the XML In Reject link
Here is an interesting trick you should try: add a Reject link to
your XMLInput stage.
For development, it would be enough to add a link from the XMLInput to a Peek.
Call your Peek stage something like "xml_rejects".
Add a 255 varchar message column (I used "msg" for the column name).
In the XMLIn stage, go to the Ouput tab and select the "xml_rejects" link,
then choose "msg" as the column name to receive your rejects.
Now you will see interesting things in the job log:
For example, from the first run, with your XML sample "as-is":
Code: Select all
xml_rejects,0: msg:XML input document parsing failed.
Reason: Xalan fatal error (publicId: , systemId: , line: 1, column: 278):
The prefix 'p1' has not been mapped to any URI
Second run, with your XML sample with a proper name space added to
the XML Document [edit](Actually on reviewing this, I noticed that
I just deleted the "p1:" part - but I bet it would work with embedding
an xmlns:p1="etc..." in the XML as well.)[edit]
Code: Select all
xml_rejects,0: msg:XML input document parsing failed. Reason: Xalan fatal error (publicId:
, systemId: , line: 1, column: 548): Expected end of tag 'Detail_Collection'
Oh - that is interesting, I didn't even look at the ending tags when
I reformatted the XML above.
Let's take a closer look now:
Oh - your nesting structure is kind of messed up.
With your original document, we have this:
Code: Select all
<?xml version="1.0" encoding="utf-8"?>
<Report
p1:schemaLocation="us..."
Name="usRep"
textbox22="Valuation Date: May 19, 2009">
<table1>
<Detail_Collection>
<table1>
<Detail_Collection>
<Detail mkt_val_eur="0" rating="N/A" issuer="N/A" master_issuer="N/A" />
<Detail mkt_val_eur="222" rating="N/A" issuer="N/A" master_issuer="N/A" />
</Detail_Collection>
</table1>
</Report>
It looks to me like there are redundant start tags.
Let's clean that up now, and try it this way:
Code: Select all
<?xml version="1.0" encoding="utf-8"?>
<Report
schemaLocation="us..."
Name="usRep"
textbox22="Valuation Date: May 19, 2009">
<!-- Ignore these 2 tags...
<table1>
<Detail_Collection>
-->
<table1>
<Detail_Collection>
<Detail mkt_val_eur="0" rating="N/A" issuer="N/A" master_issuer="N/A" />
<Detail mkt_val_eur="222" rating="N/A" issuer="N/A" master_issuer="N/A" />
</Detail_Collection>
</table1>
</Report>
Ok, that was better.
No reject messages this time.
3) start with the simplest xpath possible
Finally, for testing I started with a pretty simple xpath expression:
description="/Report/@Name"
Great, it pulled out Name="usRep".
Now that it is finding the Report.Name attribute,
we'll try something fancy:
/Report/table1/Detail_Collection/Detail/@mkt_val_eur
Ahh, very nice - it pulled out the "0" and the "222" for Detail.mkt_val_eur.
So, in summary I would say:
1) don't use broken xml. Use valid xml (at least during development,
validate your XML for well formedness in xmlspy or something before
giving it to DataStage).
2) use the XML In Reject Link. Sometimes a little feedback can be
very helpful.
3) start with the simplest xpath expression you possibly can, then build it up
once all the "simple" things like connections, xml format, name spaces, etc. are working.
Again, good luck - the lack of default feedback from the XML stages
can be pretty frustrating.