PROBLEMS WITH ENCODING XML INPUT FILES

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
ZEEVS1970
Participant
Posts: 6
Joined: Wed Mar 08, 2006 5:57 am

PROBLEMS WITH ENCODING XML INPUT FILES

Post by ZEEVS1970 »

Hello,

I am trying to read xml files but manage to do so only
if the header encoding is UTF-8 and the file itself is defined as
ANSI encoded

if I try to save the xml file as UTF-8 in the encoding checkbox on the notepad editor (the xml header remains UTF-8) the job finishes with a warning about unknown characters
and doesn't read any data

the problem with reading an XML file as ANSI is that one can not
open the file on an internet browser because the browser expects
the encodnig of the file to be consistant with its
xml header.

I understand IBM recommands reading XML files with encoding="UTF-8"
in their header but why DS dosn't support the encoding of the file itself
as UTF8 and insists the xml file to be definde as ANSI to work properly?

Is there a possible solution to this problem?

thanks from advanced

Zeev
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Hi...can you be more explicit? Perhaps show here the different headers (one that works and one that doesn't), and what the error is that you are receive from DataStage?

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Re-reading your entry...initially missed the note about specifically saving the file in notepad as UTF-8..... Very interesting. I created a few XML documents in that same method, and they get the three leading bytes (EF BB BF) indicated as necessary for UTF-8...... and so far, every tool I check on the web for online validation says that the document, because of these three bytes, is invalid. This one would take some digging, into the formal xml specification, how MicroSoft looks at it (the file opens fine as XML in IE), and how Apache looks at it (under the covers, DataStage uses the ibm sanctioned versions of apache xalan and xerces to do its work).

What is the reason you cannot use ANSI?

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
ZEEVS1970
Participant
Posts: 6
Joined: Wed Mar 08, 2006 5:57 am

The reason why I can't use ANSI

Post by ZEEVS1970 »

I assume you ment to ask why can't I use ANSI with a "WINDOWS-1255"
XML header.
Of course I do can use "UTF-8" header with ANSI and DS works fine but
then the file will not open correctly in the browser wich makes it a little more difficult to QA the Data.

The reason I can not use a "WINDOWS-1255" header is that my data has
hebrew characters.
DS does read hebrew characters correctly but only when using a "UTF-8" header.

From what I understand beacuse browser insist XML header to correspond exactly with the file encoding the solution can be only if there is a way to make DS read correctly "WINDOWS-1255" XML header including hebrew characters.
Alternatively if DS wil read correctly A "UTF-8" encoded file(with a "UTF-8"
XML header) This will also solve the problem.

Thanks from advance for all the effort...
Post Reply