Page 1 of 1

problem with XML Input Stage

Posted: Tue Mar 02, 2010 6:17 pm
by somu_june
Hi,

I had an XML document. I'm using folder stage to read data from XML and parasing to the XML input stage but I'm getting below warining in my job

: XML input document parsing failed. Reason: Xalan fatal error (publicId: , systemId: , line: 2, column: 96333757): The input ended before all started tags were ended. Last tag started was 'directory'


I had XML data as below

<?xml version="1.0" encoding="UTF-8" ?>
<directory>
<customer name="Acme" zone="Midwest">
<division>Chemical</division>
<city>St. Louis</city>
</customer>
</directory>



and In XML input stage OutPut tab columns tab I defined

ROOT_ID varchar (255) Nullable /directory
Division Varchar(20) Nullable /directory/division/text()
CITY varchar(18) Nullable /directory/city/text()


I don't know if I'm doing any thing wrong in my XML input stage. Correct me if I'm doing some thing wrong.


Thanks,
Raju

Posted: Tue Mar 02, 2010 11:10 pm
by eostic
Hard to say, but if that is your entire document, I might suspect some unprintable character in the "whitespace"....look at it in hex --- you may have something other than hex 20's where blanks should be, or something other than CRLF or LF for end of line.......(hex 'OD' and '0A')....

....or with something this simple, just type it over again in notepad or other editor, and for safe purposes, leave out the CRLF's altogether.

As for syntax there are some things missing in your xpath, so at best you will get zero records once you get past the error. division and city are contained within the customer element.....

/directory/customer/division/text()
/directory/customer/city/text()

Ernie

Posted: Wed Mar 03, 2010 3:39 pm
by somu_june
Hi eostic,

Thanks for reply. The problem is the end tag that is missing from my XML.
directory end tag is missing. So I got the above error.

<directory>
<customer name="Acme" zone="Midwest">
<division>Chemical</division>
<city>St. Louis</city>
</customer>


After fixing the XML it is running fine for 2000 records and If I ran the same XML for 20,000 records it is running from 4 hours and in monitor I can see no activity at all. How to improve performance in parsing a large XML file. I'm using a server job and using a folder stage to read the XML file .


Thanks,
Raju

Posted: Wed Mar 03, 2010 6:48 pm
by chulett
You only see 'activity' when a file has been completely processed, so things look pretty normal when you parse many small files but one large file can look like a whole lot of nothing until it is done and then - bang. Thar she blows! :wink:

Posted: Wed Mar 03, 2010 7:53 pm
by eostic
How big is it? 20k "rows" from this is not that much...are you saying that there are simply 20k customers in this one single xml document, or 20k "documents" that you are reading from disk initially (each with their own set of customer elements)?

Ernie