problem with XML Input Stage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
somu_june
Premium Member
Premium Member
Posts: 439
Joined: Wed Sep 14, 2005 9:28 am
Location: 36p,reading road

problem with XML Input Stage

Post by somu_june »

Hi,

I had an XML document. I'm using folder stage to read data from XML and parasing to the XML input stage but I'm getting below warining in my job

: XML input document parsing failed. Reason: Xalan fatal error (publicId: , systemId: , line: 2, column: 96333757): The input ended before all started tags were ended. Last tag started was 'directory'


I had XML data as below

<?xml version="1.0" encoding="UTF-8" ?>
<directory>
<customer name="Acme" zone="Midwest">
<division>Chemical</division>
<city>St. Louis</city>
</customer>
</directory>



and In XML input stage OutPut tab columns tab I defined

ROOT_ID varchar (255) Nullable /directory
Division Varchar(20) Nullable /directory/division/text()
CITY varchar(18) Nullable /directory/city/text()


I don't know if I'm doing any thing wrong in my XML input stage. Correct me if I'm doing some thing wrong.


Thanks,
Raju
somaraju
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Hard to say, but if that is your entire document, I might suspect some unprintable character in the "whitespace"....look at it in hex --- you may have something other than hex 20's where blanks should be, or something other than CRLF or LF for end of line.......(hex 'OD' and '0A')....

....or with something this simple, just type it over again in notepad or other editor, and for safe purposes, leave out the CRLF's altogether.

As for syntax there are some things missing in your xpath, so at best you will get zero records once you get past the error. division and city are contained within the customer element.....

/directory/customer/division/text()
/directory/customer/city/text()

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
somu_june
Premium Member
Premium Member
Posts: 439
Joined: Wed Sep 14, 2005 9:28 am
Location: 36p,reading road

Post by somu_june »

Hi eostic,

Thanks for reply. The problem is the end tag that is missing from my XML.
directory end tag is missing. So I got the above error.

<directory>
<customer name="Acme" zone="Midwest">
<division>Chemical</division>
<city>St. Louis</city>
</customer>


After fixing the XML it is running fine for 2000 records and If I ran the same XML for 20,000 records it is running from 4 hours and in monitor I can see no activity at all. How to improve performance in parsing a large XML file. I'm using a server job and using a folder stage to read the XML file .


Thanks,
Raju
somaraju
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

You only see 'activity' when a file has been completely processed, so things look pretty normal when you parse many small files but one large file can look like a whole lot of nothing until it is done and then - bang. Thar she blows! :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

How big is it? 20k "rows" from this is not that much...are you saying that there are simply 20k customers in this one single xml document, or 20k "documents" that you are reading from disk initially (each with their own set of customer elements)?

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
Post Reply