XML stage Error due to special character

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

These things can get a bit strange, because they are often dependent upon encoding schemes. What is at the top of your document? Does it have a specific declaration of UTF-8 ? It may need to have it specifically included, or even an alternate encoding scheme. A quick search on the web should uncover a variety of possibilities. Can you open the document inside of IE or FireFox successfully?

I've done it in the other direction, with hard character coding. Here's a snippet of xml for writing, if it offers any thoughts for your issue...

<?xml version ="1.0"?>
<mydoc>
<string> is this a copyright symbol? &#x0a9; </string>
</mydoc>

This shows up fine in Mozilla FireFox. There may be other alternatives. Decimal 169 (hex 0a9) is the value for the copyright symbol.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Hmmm... what NLS_LANG setting (i.e. characterset) is your job using? UTF8? Want to ensure it supports those characters.
-craig

"You can never have too many knives" -- Logan Nine Fingers
cosec
Premium Member
Premium Member
Posts: 230
Joined: Tue May 08, 2007 8:10 pm

Post by cosec »

At the top of the document it is as per following:
<?xml version="1.0" encoding="UTF-8"?>

I shall try using the equivalent decimal value and let u know if there is any change...

The NLS_LANG setting for the DB or Datastage ?

Thanks.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Both.
-craig

"You can never have too many knives" -- Logan Nine Fingers
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Not having installed with NLS doesn't mean there's zero support for it and the $NLS_LANG you use in your job is still important. Is that not set at all in the job's environment?
-craig

"You can never have too many knives" -- Logan Nine Fingers
cosec
Premium Member
Premium Member
Posts: 230
Joined: Tue May 08, 2007 8:10 pm

Post by cosec »

For the DB the setting are as follows:

DB2 Database:
Database Code Page = 1208
Database Code Set = UTF8

Datastage: Using project default
based on the job log i was able to obtain LANG=en_US.
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Did some testing..perhaps this will help you in your research....

Environment is Windows XP, with DataStage 8.1 and FP1.

The xml document has a character copyright in it (as in your first posting) and also the decimal version of the copyright.

IE 7, in my environment, blows up on the copyright symbol, saying it cannot read it, unless the encoding is encoding="iso-8859-1". Oddly enough, it blows up with UTF-8, and UTF-16. Not sure why at this point --- there are a lot of variables there. DataStage is the issue anyway, but this is something you should always do initially in these cases...find out if the browser(s) in your env can read it.

DataStage doesn't seem to care, on this machine, with code page of 1252. The document reads just fine with no errors using any of the encodings above, although with iso-8859-1, it also includes a capital A with an umlaut. No doubt that is due to some sort of incorrect interpretation of the two bytes needed to represent a copyright character.

I'm not an expert on NLS....not sure if that is a factor here. You can play with your encodings..... There are a lot of variables here....from the encoding, to the environment, to the NLS settings, to the release of DS. I would suggest playing with a test document that has ONLY the decimal copyright in it, and see if that works. That's the brute force method, and should work in all cases with DS, IE, etc. Worst case, you can "zap" all the copyright values in an upstream transformer to the decimal string.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
cosec
Premium Member
Premium Member
Posts: 230
Joined: Tue May 08, 2007 8:10 pm

Post by cosec »

Thanks for all the effort...I will try some of the suggestions and update you if I am able to fix it. Thanks again !
jatayl
Premium Member
Premium Member
Posts: 47
Joined: Thu Jan 19, 2006 11:20 am
Location: Rogers, AR

Post by jatayl »

I had a similar issue with "special" characters in XML. The xml would not open using IE and would not parse using the xml stage, so I decided based on our requirements to convert the non-standard xml characters to blanks. Since the xml was generated from a application that allowed users to enter anything they wanted in a field, I had to accommodate for them to put unwanted characters in the xml. Used a simple routine to clean up the xml prior to parsing it in the xml stage.

My thoughts.

Jason
Post Reply