Page 1 of 1

Ampersand sign in XML Stage Causing error

Posted: Fri Jul 05, 2013 4:22 pm
by darrreever
Hello DataStagers:

I am reading a string from a 9.7DB2 table. The string contains the ampersand "&" sign. (an example "Johnson & Johnson")When I pass the VARCHAR string to an XML Input Stage (not the 8.5 XML Stage) with fatal, error, and warning set to go to a reject link, the record gets rejected.

The same behaviour happens with the !@#$% symbols as well. However, when I use NVARCHAR as the string data type, all of the special symbols pass through fine. The record containing the & is still rejected.

I understand that the "&" has special meaning in both XML and HTML and so this is probably the issue. As you know, the symbols I listed, as well as &, are all part of the ascii set and not double byte or require unicode handling.

Has anyone experienced/solved this problem?

Thanks
God Bless

Posted: Fri Jul 05, 2013 5:05 pm
by chulett
That ampersand needs to be replaced with & to be valid in XML data.

Posted: Fri Jul 05, 2013 6:09 pm
by darrreever
Hello Mr. Craig!

Thanks for the quick response! Right on the money, yet again! :D

You are exactly right. Easily checked by simply putting the xml message in notepad, saving as an .xml, and viewing in IE or other browser. With just a '&' the xml will not view, but replacing '&' with '&' (no quotes) it displays perfectly.

Now comes the perennial problem of how to search for more than one occurence of a string and replace with another string in a parallel job.

Once again thanks.

Posted: Fri Jul 05, 2013 8:02 pm
by ray.wurlod
The EReplace() function can replace multiple instances. If you're earlier than 9.1, track down the pxEReplace() function published on DSXchange.

Posted: Tue Jul 09, 2013 12:33 pm
by rameshrr3
replace all double quotes ", ampersand &, apostrophe ' [single quotes] ,greater-than >, less-than < signs with &quot,&amp,&apos,&gt,&lt respectively . I guess XML parser considers these as reserved characters. Create a function that replaces these characters one after another - possibly using cascaded Change() or Ereplace() for server jobs or use some equivalent in c/C++ for a custom parallel routine, the DB2 SQL function library should also have some functions to replace characters.

Posted: Tue Jul 09, 2013 2:40 pm
by eostic
All of this is a correct discussion about escape characters and xml, but it would be useful to add some more context....

The initial post notes:

"I am reading a string from a 9.7 DB2 table. The string contains the ampersand sign........When I pass the VARCHAR string to an XMLInput Stage......."

So... is the "string" that is coming from DB2 already an xml document or chunk of xml content?

If so, then yes, everything said here is correct --- however, the tooling that generated the string is wrong --- it should be converting that ampersand automatically BEFORE it loads it into db2.....

Indeed, if DataStage xmlOutput Stage (or xml stage) was being used to "generate" this xml, it would be escaped automatically....

Ernie

Posted: Tue Jul 09, 2013 3:00 pm
by rameshrr3
I have seen some XML where the element data had ampersands, evidently not all xml generating tools make these conversions. Interstingly even datastage 8.0x had a bug when you defined an environment variable whose value is enclosed in opening and closing tag pair like <env_value>.

The one question to the OP , is did you set the data element for this column to 'XML' ?

Thanks
Ramesh