Hello,
I am trying to read xml files but manage to do so only
if the header encoding is UTF-8 and the file itself is defined as
ANSI encoded
if I try to save the xml file as UTF-8 in the encoding checkbox on the notepad editor (the xml header remains UTF-8) the job finishes with a warning about unknown characters
and doesn't read any data
the problem with reading an XML file as ANSI is that one can not
open the file on an internet browser because the browser expects
the encodnig of the file to be consistant with its
xml header.
I understand IBM recommands reading XML files with encoding="UTF-8"
in their header but why DS dosn't support the encoding of the file itself
as UTF8 and insists the xml file to be definde as ANSI to work properly?
Is there a possible solution to this problem?
thanks from advanced
Zeev
PROBLEMS WITH ENCODING XML INPUT FILES
Moderators: chulett, rschirm, roy
Hi...can you be more explicit? Perhaps show here the different headers (one that works and one that doesn't), and what the error is that you are receive from DataStage?
Ernie
Ernie
Ernie Ostic
blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
Re-reading your entry...initially missed the note about specifically saving the file in notepad as UTF-8..... Very interesting. I created a few XML documents in that same method, and they get the three leading bytes (EF BB BF) indicated as necessary for UTF-8...... and so far, every tool I check on the web for online validation says that the document, because of these three bytes, is invalid. This one would take some digging, into the formal xml specification, how MicroSoft looks at it (the file opens fine as XML in IE), and how Apache looks at it (under the covers, DataStage uses the ibm sanctioned versions of apache xalan and xerces to do its work).
What is the reason you cannot use ANSI?
Ernie
What is the reason you cannot use ANSI?
Ernie
Ernie Ostic
blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
The reason why I can't use ANSI
I assume you ment to ask why can't I use ANSI with a "WINDOWS-1255"
XML header.
Of course I do can use "UTF-8" header with ANSI and DS works fine but
then the file will not open correctly in the browser wich makes it a little more difficult to QA the Data.
The reason I can not use a "WINDOWS-1255" header is that my data has
hebrew characters.
DS does read hebrew characters correctly but only when using a "UTF-8" header.
From what I understand beacuse browser insist XML header to correspond exactly with the file encoding the solution can be only if there is a way to make DS read correctly "WINDOWS-1255" XML header including hebrew characters.
Alternatively if DS wil read correctly A "UTF-8" encoded file(with a "UTF-8"
XML header) This will also solve the problem.
Thanks from advance for all the effort...
XML header.
Of course I do can use "UTF-8" header with ANSI and DS works fine but
then the file will not open correctly in the browser wich makes it a little more difficult to QA the Data.
The reason I can not use a "WINDOWS-1255" header is that my data has
hebrew characters.
DS does read hebrew characters correctly but only when using a "UTF-8" header.
From what I understand beacuse browser insist XML header to correspond exactly with the file encoding the solution can be only if there is a way to make DS read correctly "WINDOWS-1255" XML header including hebrew characters.
Alternatively if DS wil read correctly A "UTF-8" encoded file(with a "UTF-8"
XML header) This will also solve the problem.
Thanks from advance for all the effort...