XML Input Stage Error While parsing the XML document

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
WBlaustein
Premium Member
Premium Member
Posts: 24
Joined: Fri Oct 28, 2011 10:55 am

XML Input Stage Error While parsing the XML document

Post by WBlaustein »

Hello All,

I am having a problem while parsing the data from a XML column.

my job looks like this

DataSet1--->XML Input Stage---->Dataset2

my XML data is in a column in Dataset1

I included the namespace declaration for the XML Input Stage.

I haven't checked the "validate input XML" checkbox in my XML Input Stage.

I am getting the following error message in my director log

xmlinp_getheader_data,0: Fatal Error: Fatal: XML input document parsing failed. Reason: Xalan fatal error (publicId: , systemId: , line: 0, column: 0): An exception occurred! Type:MalformedURLException, Message:The URL used an unsupported protocol
Xalan fatal error (publicId: , systemId: , line: 4, column: 310): Fatal error encountered during schema scan
Xalan fatal error (publicId: , systemId: , line: 4, column: 310): Fatal error encountered during schema scan
Xalan fatal error (publicId: , systemId: , line: 0, column: 0): An exception occurred! Type:MalformedURLException, Message:The URL used an unsupported protocol

Trying to figure out what might be the cause of my error.

--
Thanks,
Bill
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

If the xml content is actually "in" a column coming over on the link from the dataset stage, then be sure, in the xmlInput Stage, to check the box that says "xml content". It is not the default. .....sounds like it is taking whatever string is in there and trying to use it to locate the xml document via URL.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
WBlaustein
Premium Member
Premium Member
Posts: 24
Joined: Fri Oct 28, 2011 10:55 am

Post by WBlaustein »

Thanks for the response Erine.

I have checked the column content as "XML Document" only at the time of job creation itself.

The issue that I observed was with the schema file location at which the document is pointing to.

The below given is the Root Element of my XML Document and as you can see the bolded content is the schema file at which I don't have the access to it.

<CA-Return returnVersion="2012v3.1" xmlns="http://www.ftb.ca.gov/efile" xmlns:irs="http://www.irs.gov/efile" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.ftb.ca.gov/efile file:///efile/PIT_RETURNS/CA-Return540.xsd">


I really don't want to validate the incoming xml document against schema file (I unchecked the "Validate Input XML" option also in my stage), I just need to grab few columns from the <CA-Header>

My high level xml structure looks like below:

<CA-Return>
<CA-Header>
<CA-Data>
</CA-Return>

Is there a way I can bypass the schema file reference?

--
Thanks,
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

The easiest thing to do, especially if you are already consuming the entire string and passing it as a column value, is to just "zap" it. Pass the string thru a Basic Transformer and either pull off each of the pieces or just use something like ereplace and blow away the whole schemaLocation attribute and its value.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
Post Reply