XML stage Error due to special character
Moderators: chulett, rschirm, roy
These things can get a bit strange, because they are often dependent upon encoding schemes. What is at the top of your document? Does it have a specific declaration of UTF-8 ? It may need to have it specifically included, or even an alternate encoding scheme. A quick search on the web should uncover a variety of possibilities. Can you open the document inside of IE or FireFox successfully?
I've done it in the other direction, with hard character coding. Here's a snippet of xml for writing, if it offers any thoughts for your issue...
<?xml version ="1.0"?>
<mydoc>
<string> is this a copyright symbol? © </string>
</mydoc>
This shows up fine in Mozilla FireFox. There may be other alternatives. Decimal 169 (hex 0a9) is the value for the copyright symbol.
Ernie
I've done it in the other direction, with hard character coding. Here's a snippet of xml for writing, if it offers any thoughts for your issue...
<?xml version ="1.0"?>
<mydoc>
<string> is this a copyright symbol? © </string>
</mydoc>
This shows up fine in Mozilla FireFox. There may be other alternatives. Decimal 169 (hex 0a9) is the value for the copyright symbol.
Ernie
Ernie Ostic
blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
Did some testing..perhaps this will help you in your research....
Environment is Windows XP, with DataStage 8.1 and FP1.
The xml document has a character copyright in it (as in your first posting) and also the decimal version of the copyright.
IE 7, in my environment, blows up on the copyright symbol, saying it cannot read it, unless the encoding is encoding="iso-8859-1". Oddly enough, it blows up with UTF-8, and UTF-16. Not sure why at this point --- there are a lot of variables there. DataStage is the issue anyway, but this is something you should always do initially in these cases...find out if the browser(s) in your env can read it.
DataStage doesn't seem to care, on this machine, with code page of 1252. The document reads just fine with no errors using any of the encodings above, although with iso-8859-1, it also includes a capital A with an umlaut. No doubt that is due to some sort of incorrect interpretation of the two bytes needed to represent a copyright character.
I'm not an expert on NLS....not sure if that is a factor here. You can play with your encodings..... There are a lot of variables here....from the encoding, to the environment, to the NLS settings, to the release of DS. I would suggest playing with a test document that has ONLY the decimal copyright in it, and see if that works. That's the brute force method, and should work in all cases with DS, IE, etc. Worst case, you can "zap" all the copyright values in an upstream transformer to the decimal string.
Ernie
Environment is Windows XP, with DataStage 8.1 and FP1.
The xml document has a character copyright in it (as in your first posting) and also the decimal version of the copyright.
IE 7, in my environment, blows up on the copyright symbol, saying it cannot read it, unless the encoding is encoding="iso-8859-1". Oddly enough, it blows up with UTF-8, and UTF-16. Not sure why at this point --- there are a lot of variables there. DataStage is the issue anyway, but this is something you should always do initially in these cases...find out if the browser(s) in your env can read it.
DataStage doesn't seem to care, on this machine, with code page of 1252. The document reads just fine with no errors using any of the encodings above, although with iso-8859-1, it also includes a capital A with an umlaut. No doubt that is due to some sort of incorrect interpretation of the two bytes needed to represent a copyright character.
I'm not an expert on NLS....not sure if that is a factor here. You can play with your encodings..... There are a lot of variables here....from the encoding, to the environment, to the NLS settings, to the release of DS. I would suggest playing with a test document that has ONLY the decimal copyright in it, and see if that works. That's the brute force method, and should work in all cases with DS, IE, etc. Worst case, you can "zap" all the copyright values in an upstream transformer to the decimal string.
Ernie
Ernie Ostic
blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
I had a similar issue with "special" characters in XML. The xml would not open using IE and would not parse using the xml stage, so I decided based on our requirements to convert the non-standard xml characters to blanks. Since the xml was generated from a application that allowed users to enter anything they wanted in a field, I had to accommodate for them to put unwanted characters in the xml. Used a simple routine to clean up the xml prior to parsing it in the xml stage.
My thoughts.
Jason
My thoughts.
Jason