white spaces in xml

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
knowledge
Participant
Posts: 101
Joined: Mon Oct 17, 2005 8:14 am

white spaces in xml

Post by knowledge »

Hi ,
I am getting following error in xml input stage:
The job design is
folder stage....> xml input stage...> seq file stage.

[b]XML input document parsing failed. Reason: Xalan error (publicId: ,
systemId: , line: 826, column: 22): Datatype error:
Type:InvalidDatatypeValueException, Message:Value '8' does not match
any member types (of the union)[/b]

Line 826 is <E19_04> 8</E19_04> node which is giving problem.

The schema details for this node is as follows:

...................................................................
<xs:element xmlns:xs="http://www.w3.org/2001/XMLSchema " minOccurs="0" name="E19_04" type=" SizeOfProcedureEquipment">
<xs:annotation >
<xs:documentation>The size of the equipment used in the procedure on the patient </xs:documentation>
</xs:annotation>
</xs:element>

<
xs:simpleType name="SizeOfProcedureEquipment">
<xs:annotation>
<xs:documentation> The size of the equipment used in the procedure on the patient</xs:documentation >
</xs:annotation>
<xs:union memberTypes ="NullValues">
<xs:simpleType>
<xs:restriction base ="xs:string">
<xs:minLength value ="2" />
<xs:maxLength value ="20" />
</xs:restriction>
</xs:simpleType>
</xs:union>
</xs:simpleType>

........................................
My guess is ,
<E19_04> 8</E19_04> , datastage input stage is removing spaces before digit '8' , thus only keeping '8' which is 1 char , and the definition is <xs:minLength value ="2" />
<xs:maxLength value ="20" /> which requires 2 char

How can I take the data as it is , I mean preserve the whitespaces in datastage input stage,
please suggest,

Thanks,
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Not easy. Can you pre-process to replace them with non-breaking space (&NBSP;) characters?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

There's more to this than meets the eye. DataStage itself typically isn't doing anything at this point...it's all xalan, who does the schema validation. Have you run this thru any other validation mechanisms? I haven't been out to the site in awhile, but I believe you can check validity here: http://www.w3.org/2001/03/webdata/xsv#hlp-warn

Alternatively, and perhaps even better, get a copy of XMLspy and put your document thru validation there.

I'll have to dig more into xs:union. I usually see minInclusive/maxInclusive, or a list of enumerated types.

Ernie
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

correction, as I re-read my note above. xerces is doing the validating.

Ernie
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

...and, if you go to www.apache.org, and find and download xerces, you will find http://xml.apache.org/xerces-c/stdinparse.html , and can perform command line validation.

This looks cool. Will have to try it myself, too. I've usually depended on XMLSpy, but have a new laptop and haven't re-installed my old license. This could be better. I'll follow up after I try it.

Bottom line --- check your xml instance document using external validation....if it fails validation, then there's either a bug in the parser (unlikely) or the usage case is wrong, or the data is represented incorrectly. If it passes validation, then we have to look more closely at why it fails under DS. Could be the data gets manipulated or damanged on the way into the parser, or we have too old a parser, or something else.

Ernie
knowledge
Participant
Posts: 101
Joined: Mon Oct 17, 2005 8:14 am

Post by knowledge »

Hi Ray,
How can I preprocess file ? I am getting file from vendor ? Is there any way I can pre process it , please suggest.(I checked through validator , it validates ok but fails in datastage)

Hi Ernie,
I have downloaded validator from vendor site , all error files validated ok ,
I checked the file in the validator and it gives <19_04> 8<19_04>
that means it preserves white space , but if I open same file with explorer
it trims white space and gives <19_04>8<19_04>,

I changed (<xs:minLength value="1"/>)xsd as follows :

<xs:simpleType name="SizeOfProcedureEquipment">
<xs:annotation>
<xs:documentation>The size of the equipment used in the procedure on the patient</xs:documentation>
</xs:annotation>
<xs:union memberTypes="NullValues">
<xs:simpleType>
<xs:restriction base="xs:string">
******** <xs:minLength value="1"/>********* which was '2' earlier
<xs:maxLength value="20"/>
</xs:restriction>
</xs:simpleType>
</xs:union>
</xs:simpleType>

but then too its showing the same error,

Please suggest me what can be the solution ,
Thanks for ur help,
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You could filter it through a sed or awk command (or maybe even tr depending on your exact requirements).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
knowledge
Participant
Posts: 101
Joined: Mon Oct 17, 2005 8:14 am

Post by knowledge »

[quote="ray.wurlod"]You could filter it through a [b]sed [/b]or [b]awk [/b]command (or maybe even [b]tr [/b]depending on your exact requirements). ...[/quote]

Hi All,
I thought the same but when I contacted vendor , he said he will resend all files replacing ' 8' by '8.0'
So for time being, the problem is solved as I am getting whitespace spaces only in one tag.

Thanks a lot for ur help,
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Good solution.. This is a tough one. White space is generally supposed to be ignored as meaningless by XML.... seems various tools are treating it differently.

Ernie
knowledge
Participant
Posts: 101
Joined: Mon Oct 17, 2005 8:14 am

Post by knowledge »

[quote="eostic"]Good solution.. This is a tough one. White space is generally supposed to be ignored as meaningless by XML.... seems various tools are treating it differently.

Ernie[/quote]
yes,
U r right,
Thanks .
Post Reply