Cannot process xml files using xml input stage.

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
san0907
Participant
Posts: 9
Joined: Thu Jun 25, 2009 1:19 am

Cannot process xml files using xml input stage.

Post by san0907 »

Hi,

I cannot parse xml files of below format using xml input stage.Job is running fine and no output written
to the file.I have included namespace declaration and repetition key element.

<table1><Detail_Collection><Detail mkt_val_eur="0" rating="N/A" issuer="N/A" master_issuer="N/A" /><Detail mkt_val_eur="222" rating="N/A" issuer="N/A" master_issuer="N/A" /></Detail_Collection></table1>

my xpath expressions created from xml data importer is below

/ns1:table1/ns1:Detail_Collection/ns1:Detail/@mkt_val_eur
/ns1:table1/ns1:Detail_Collection/ns1:Detail/@rating
/ns1:table1/ns1:Detail_Collection/ns1:Detail/@issuer
/ns1:table1/ns1:Detail_Collection/ns1:Detail/@master_issuer

Job design

External source stage----->xml input stage------>Sequential File

Am i missing something?Could you please help me out on this?
Thanks and Regards
Sana
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

With namespace issues it's sometimes hard to say what might be at issue. ..... first, I'd test by trying to remove the default namespace from the header [if one exists....probably is something like xmlns="http://....."] and also remove the ns1: from your xpath statements. This will at least tell you if your xpath is valid for the document you are trying to read.
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
san0907
Participant
Posts: 9
Joined: Thu Jun 25, 2009 1:19 am

Post by san0907 »

Thanks eostic for your reply.

I removed the default name space from the header and ns1: from my xpath statements.Still i couldn't able to parse the xml document.

Is there any other alternative to read this file?Please let me know

Thanks in advance

sana
Thanks and Regards
Sana
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Please post your 'new' XPath Expressions and a sample of the edited input XML, please.
-craig

"You can never have too many knives" -- Logan Nine Fingers
san0907
Participant
Posts: 9
Joined: Thu Jun 25, 2009 1:19 am

Post by san0907 »

Hi,

Below is the modified xml file and xpath expressions.Thanks.

<?xml version="1.0" encoding="utf-8"?><Report p1:schemaLocation="usrep http://r/RS01?%2fExposure+Reports%2fusR ... chema=True" Name="usRep" textbox22="Valuation Date: May 19, 2009"><table1><Detail_Collection><table1><Detail_Collection><Detail mkt_val_eur="0" rating="N/A" issuer="N/A" master_issuer="N/A" /><Detail mkt_val_eur="222" rating="N/A" issuer="N/A" master_issuer="N/A" /></Detail_Collection></table1></Report>


/Report/table1/Detail_Collection/Detail/@mkt_val_eur
/Report/table1/Detail_Collection/Detail/@rating
/Report/table1/Detail_Collection/Detail/@issuer
/Report/table1/Detail_Collection/Detail/@master_issuer
Thanks and Regards
Sana
jgreve
Premium Member
Premium Member
Posts: 107
Joined: Mon Sep 25, 2006 4:25 pm

xml fun - can it get any better?

Post by jgreve »

[edit]some cleaup[/edit]
san0907 wrote:Hi,

Below is the modified xml file and xpath expressions.Thanks.

<?xml version="1.0" encoding="utf-8"?><Report p1:schemaLocation="usrep http://r/RS01?%2fExposure+Reports%2fusR ... chema=True" Name="usRep" textbox22="Valuation Date: May 19, 2009"><table1><Detail_Collection><table1><Detail_Collection><Detail mkt_val_eur="0" rating="N/A" issuer="N/A" master_issuer="N/A" /><Detail mkt_val_eur="222" rating="N/A" issuer="N/A" master_issuer="N/A" /></Detail_Collection></table1></Report>


/Report/table1/Detail_Collection/Detail/@mkt_val_eur
/Report/table1/Detail_Collection/Detail/@rating
/Report/table1/Detail_Collection/Detail/@issuer
/Report/table1/Detail_Collection/Detail/@master_issuer
This is timely for me - I'm building some xpath expresions for
an XML Input stage myself, and have been experiencing the joys
of XML and NameSpaces. Based on the example XML and the xpath
expressions in your last post, here are the summary points for you.

John G.


<free_advice_summary>
1) don't use broken xml.
2) use the XML In Reject link
3) start with the simplest xpath possible
</free_advice_summary>

Note that I find the xml processing in Datastage to be crippled at best.
It is tedious and error prone to work with.
Be patient and take small steps; good luck!

Here are some details and observations for you...

1) don't use broken xml.

Here is what your XML looks like with a little human-friendly formatting.
I see that you have attribute ReportTag.schemaLocation but...
it is actually ReportTag.p1:schemaLocation.
I'm betting that DataStage is unhappy with the "p1:" namespace prefix.
It isn't enough to just have a xmlns:p1="http://blah.blah.blah"
in DataStage's Namespace Declarations; it also has to be in the
actual XML Document.

Code: Select all

<?xml version="1.0" encoding="utf-8"?>
<Report
   p1:schemaLocation="usrep http://r/RS01?%2fExposure+Reports%2fusRep&rs%3aCommand=Render&rs%3aFormat=XML&rs%3aSessionID=ytlmrhjiwad3twrggvws0h55&rc%3aSchema=True"
   Name="usRep"
   textbox22="Valuation Date: May 19, 2009">
<table1>
<Detail_Collection>
   <table1>
      <Detail_Collection>
      <Detail mkt_val_eur="0" rating="N/A" issuer="N/A" master_issuer="N/A" />
      <Detail mkt_val_eur="222" rating="N/A" issuer="N/A" master_issuer="N/A" />
   </Detail_Collection>
   </table1>
</Report>
What the heck, let's try it: I fired up a test job to run this through and
sure enough, the XMLIn stage was unhappy.


2) use the XML In Reject link
Here is an interesting trick you should try: add a Reject link to
your XMLInput stage.
For development, it would be enough to add a link from the XMLInput to a Peek.
Call your Peek stage something like "xml_rejects".
Add a 255 varchar message column (I used "msg" for the column name).
In the XMLIn stage, go to the Ouput tab and select the "xml_rejects" link,
then choose "msg" as the column name to receive your rejects.

Now you will see interesting things in the job log:
For example, from the first run, with your XML sample "as-is":

Code: Select all

   xml_rejects,0: msg:XML input document parsing failed.
   Reason: Xalan fatal error (publicId: , systemId: , line: 1, column: 278):
   The prefix 'p1' has not been mapped to any URI
Second run, with your XML sample with a proper name space added to
the XML Document [edit](Actually on reviewing this, I noticed that
I just deleted the "p1:" part - but I bet it would work with embedding
an xmlns:p1="etc..." in the XML as well.)[edit]

Code: Select all

   xml_rejects,0: msg:XML input document parsing failed.  Reason: Xalan fatal error (publicId:
   , systemId: , line: 1, column: 548): Expected end of tag 'Detail_Collection'

Oh - that is interesting, I didn't even look at the ending tags when
I reformatted the XML above.
Let's take a closer look now:

Oh - your nesting structure is kind of messed up.

With your original document, we have this:

Code: Select all

<?xml version="1.0" encoding="utf-8"?>
<Report
   p1:schemaLocation="us..."
   Name="usRep"
   textbox22="Valuation Date: May 19, 2009">
<table1>
   <Detail_Collection>
      <table1>
         <Detail_Collection>
            <Detail mkt_val_eur="0" rating="N/A" issuer="N/A" master_issuer="N/A" />
            <Detail mkt_val_eur="222" rating="N/A" issuer="N/A" master_issuer="N/A" />
         </Detail_Collection>
      </table1>
</Report> 
It looks to me like there are redundant start tags.
Let's clean that up now, and try it this way:

Code: Select all

<?xml version="1.0" encoding="utf-8"?>
<Report
   schemaLocation="us..."
   Name="usRep"
   textbox22="Valuation Date: May 19, 2009">
<!-- Ignore these 2 tags...
   <table1>
   <Detail_Collection>
-->
      <table1>
         <Detail_Collection>
            <Detail mkt_val_eur="0" rating="N/A" issuer="N/A" master_issuer="N/A" />
            <Detail mkt_val_eur="222" rating="N/A" issuer="N/A" master_issuer="N/A" />
         </Detail_Collection>
      </table1>
</Report> 

Ok, that was better.
No reject messages this time.


3) start with the simplest xpath possible
Finally, for testing I started with a pretty simple xpath expression:
description="/Report/@Name"

Great, it pulled out Name="usRep".

Now that it is finding the Report.Name attribute,
we'll try something fancy:
/Report/table1/Detail_Collection/Detail/@mkt_val_eur

Ahh, very nice - it pulled out the "0" and the "222" for Detail.mkt_val_eur.


So, in summary I would say:
1) don't use broken xml. Use valid xml (at least during development,
validate your XML for well formedness in xmlspy or something before
giving it to DataStage).

2) use the XML In Reject Link. Sometimes a little feedback can be very helpful.

3) start with the simplest xpath expression you possibly can, then build it up
once all the "simple" things like connections, xml format, name spaces, etc. are working.


Again, good luck - the lack of default feedback from the XML stages
can be pretty frustrating.
Last edited by jgreve on Sun Jul 05, 2009 10:44 am, edited 1 time in total.
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

awesome post! :)
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
san0907
Participant
Posts: 9
Joined: Thu Jun 25, 2009 1:19 am

Post by san0907 »

Thanks jgreve and eostic for your replies.

I could parse the xml document after removing namespace declarations
from xml file and xpath expressions.

Could you please let me know the reason why datastage wont accept namespace ReportTag. p1: schemaLocation?Because i wont get the modified xml file with namespace removed for processing on daily basis from vendor.

I have no other option than to process the original file with namespace declarations.

Thanks
Sana
Thanks and Regards
Sana
jgreve
Premium Member
Premium Member
Posts: 107
Joined: Mon Sep 25, 2006 4:25 pm

limited options

Post by jgreve »

Sana - if your source will not change the document, then your options are limited.

First, do you understand why the document you posted is broken? This is a key point. You must understand why the document you posted is broken before you can make progress. I'm guessing it is unclear to you, so I will try to explain a little more. (My apologies if this is repetitious for you.)

The thing is, that xml document you posted only refers to a namespace with local name "p1:".

The xml document must also declare "p1:" with something like this:
xmlns:p1="long_namespace_here"

This must happen inside the xml document.
for example, consider this very short xml document:

Code: Select all

<a p1:x="1"/>
By itself, we are using a namespace "p1:" but it this document is incomplete (e.g. broken) because it doesn't declare what p1: is.


I believe this document will work:

Code: Select all

<a p1:x="1" xmlns:p1="long_namespace_here"/>
Does that make sense?

If so, then you are ready to think about your options. If your sender is really not giving you the p1: namespace declaration in the document, you will have to convince them to change their document, or else change it yourself before you work on it.

By the way, did you post the entire XML document that you are getting from your source? Or only an edited sample? If you are not working with a complete, unaltered original document from your source you may very well be wasting time. I would encourage you to make sure you have the complete document.

Good luck, Sana
pavan31081980
Participant
Posts: 17
Joined: Sun Mar 19, 2006 5:46 am
Location: vja

Re: Cannot process xml files using xml input stage.

Post by pavan31081980 »

Hi Sana,

Please try this in the description of the xml input stage

After the Xpath in the description, include- /text()
i.e Xpath/text()

Cheers,
Pavan

I cannot parse xml files of below format using xml input stage.Job is running fine and no output written
to the file.I have included namespace declaration and repetition key element.

<table1><Detail_Collection><Detail mkt_val_eur="0" rating="N/A" issuer="N/A" master_issuer="N/A" /><Detail mkt_val_eur="222" rating="N/A" issuer="N/A" master_issuer="N/A" /></Detail_Collection></table1>

my xpath expressions created from xml data importer is below

/ns1:table1/ns1:Detail_Collection/ns1:Detail/@mkt_val_eur
/ns1:table1/ns1:Detail_Collection/ns1:Detail/@rating
/ns1:table1/ns1:Detail_Collection/ns1:Detail/@issuer
/ns1:table1/ns1:Detail_Collection/ns1:Detail/@master_issuer

Job design

External source stage----->xml input stage------>Sequential File

Am i missing something?Could you please help me out on this?[/quote]
san0907
Participant
Posts: 9
Joined: Thu Jun 25, 2009 1:19 am

Post by san0907 »

Thanks a lot jgreve for your detailed reply.

I removed the namespaces from xml document as suggested by eostic for testing and it makes the actual xml document broken.My actual document header is below.

It contains the namespace mentioned by you xmlns:p1="http://www.w3.org/2001/XMLSchema-instance" xmlns="USARep".I could not process the original file
with the xpath expressions with namespaces and having included it in the stage and output transformation settings.


<?xml version="1.0" encoding="utf-8"?><Report p1:schemaLocation="USARep http://rpt.uspimdev/RS01?%2fExposure+Re ... chema=True" Name="USARep" textbox22="Valuation Date: May 19, 2009" xmlns:p1="http://www.w3.org/2001/XMLSchema-instance" xmlns="USARep"><table1><Detail_Collection><table1><Detail_Collection><Detail mkt_val_eur="0" rating="N/A" issuer="N/A" master_issuer="N/A" /><Detail mkt_val_eur="222" rating="N/A" issuer="N/A" master_issuer="N/A" /></Detail_Collection></table1></Report>


xpath.

/ns1:table1/ns1:Detail_Collection/ns1:Detail/@mkt_val_eur
/ns1:table1/ns1:Detail_Collection/ns1:Detail/@rating
/ns1:table1/ns1:Detail_Collection/ns1:Detail/@issuer
/ns1:table1/ns1:Detail_Collection/ns1:Detail/@master_issuer

namespace declarations.

xmlns:soapenc="http://schemas.xmlsoap.org/soap/encoding/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:p1="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:ns1="USARep"
xmlns:wsdl="http://schemas.xmlsoap.org/wsdl/"





I tried having a reject link which i am getting below error message.

Peek_447,0: REJECT:XML input document parsing failed. Reason: Xalan fatal error (publicId: , systemId: , line: 0, column: 0): An exception occurred! Type:NetAccessorException, Message:Could not open file: {0}
Xalan fatal error (publicId: , systemId: /ETLSHARES/bosdevfs01/et

Pavan:

I will try your suggession

Thanks
sana
Thanks and Regards
Sana
jgreve
Premium Member
Premium Member
Posts: 107
Joined: Mon Sep 25, 2006 4:25 pm

did you really post the complete xml document?

Post by jgreve »

san0907 wrote:Thanks a lot jgreve for your detailed reply.

I removed the namespaces from xml document as suggested by eostic for testing and it makes the actual xml document broken.My actual document header is below.

When you removed the p1: namespace declaration from the
document, I think you need to also remove the p1: prefix part
from the actual elements and tags as well.

At any rate, below are some more notes.

However: it looks to me like your XML document is still broken.
Can you post a complete, unmodified version of the XML you
are trying to work with?

Notes about the latest example xml:
That 'schemaLocation' attribute looks like a headache.
It seems to me that your XML source people are making
things more complicated than it needs to be.
*shrug*
Let's see what we can do about it.

The xalan error "can't open file" is strange.

Code: Select all

REJECT:XML input document parsing failed.
Reason: Xalan fatal error (publicId: , systemId: , line: 0, column: 0): An exception occurred!
Type:NetAccessorException, Message:Could not open file: {0} 
Hmm... I wonder if it is trying to access a schema file?
Are you using xml validation in DataStage?


The "schemaLocation" value is very long; that makes it hard to
see the overall document structure and make examples, so I am going
to abbreviate it as "http://rtp.uspimdev..."
That will give us:

Code: Select all

[b]original:[/b] schemaLocation="USARep http://rpt.uspimdev/RS01?%2fExposure+Reports%2fUSARepo&rs%3aCommand=Render&rs%3aFormat=XML&rs%3aSessionID=ytlmrhjiwad3twrggvws0h55&rc%3aSchema=True" -->
[b]shortened:[/b] schemaLocation="USARep http://rtp.uspimdev..." 
So - I did some some reading on xml & schemaLocation.
It looks like schmeaLocation is a namespace + URL hack.

Code: Select all

In an instance document, the attribute xsi:schemaLocation provides
 hints from the author to a processor regarding the location of schema
 documents. The author warrants that these schema documents are relevant
 to checking the validity of the document content, on a namespace
 by namespace basis.
excerpt from http://www.stylusstudio.com/w3c/schema0/schemaLocation.htm
So... this looks like another way to define up XML NameSpaces.
The big difference being that a normal xmlns:foo="bar" does not
require that "bar" be a live URL; it is just dead string text.
*sigh* It sounds like somebody really wanted "live URL's" that
they could pull schemas from.

So with the shortened version of "schemaLocation", let's
see if we can get your XML working. Here is the xml document
with some formatting.
(note: I can't figure out how to do color hilighting inside a [ code ] block, so the leading periods are just to force the indenting. If anyone knows a better
way to do this, please let me know- anyway, it works well enough to make the point).
<?xml version="1.0" encoding="utf-8"?>
<Report
....p1:schemaLocation="USARep http://rtp.uspimdev..."
....Name="USARep"
....textbox22="Valuation Date: May 19, 2009"
....xmlns:p1="http://www.w3.org/2001/XMLSchema-instance"
....xmlns="USARep">
...<table1>
......<Detail_Collection>
.........<table1>
............<Detail_Collection>
...............<Detail mkt_val_eur="0" rating="N/A" issuer="N/A" master_issuer="N/A" />
...............<Detail mkt_val_eur="222" rating="N/A" issuer="N/A" master_issuer="N/A" />
............</Detail_Collection>
.........</table1>

....<!-- the green part is fine, but
..........there should be closing tags here for the red part.
....-->

</Report>

So, it looks like your document is still broken.
Where are the closing tags for the outer <table1><Detail_Collection>
Can you post the actual complete unmodified XML document you are working with?

I think you can ignore the schemaLocation thing.
Your namespace declarations also look fine, unless... how smart is the Xalan xml
parser with respect to I don't know how "smart" the xml parser is.
If we follow the chain all the way, we get:
ns1 -> USARep -> http://rtp.uspimdev...

Code: Select all

namespace declarations. (in DataStage)
xmlns:soapenc="http://schemas.xmlsoap.org/soap/encoding/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:p1="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:ns1="USARep"
xmlns:wsdl="http://schemas.xmlsoap.org/wsdl/"

I would expect what you have to work as-is, except for the
part about your xpath' not actually matching your document
structure.

Code: Select all

xpath.
/ns1:table1/ns1:Detail_Collection/ns1:Detail/@mkt_val_eur
/ns1:table1/ns1:Detail_Collection/ns1:Detail/@rating
/ns1:table1/ns1:Detail_Collection/ns1:Detail/@issuer
/ns1:table1/ns1:Detail_Collection/ns1:Detail/@master_issuer
If your document structure is correct, what you really need is
something like this:

Code: Select all

/table1/Detail_Collection/table1/Detail_Collection/Detail/@mkt_val_eur
Can you post a complete, unedited version of your xml document?
John G.
Post Reply