Affects Version/s: None
Fix Version/s: 1.3
according to XML 1.0 spec, valid XML charactes are
#x9 | #xA | #xD | x20-#xD7FF | xE000-#xFFFD | x10000-#x10FFFF
and any other characters are not allowed to appear in XML document, even as character references.
I tried to serialize String "\0", and got illegal character NUL in XML file when used default serializer and got invalid character reference � when used StaxWriter.
I've always misinterpreted the spec, since I thought it is possible to write those as entities. However, they may not appear at all in XML, which makes it really difficult. Escaping does not work, since the XML parser would have to "unescape" those values also. Funny enough, even JDK XML serialization writes 0 as an entity value and fails when reading the generated XML. Maybe we have to introduce some quirks mode for XStream's PrettyPrintWriter. As long as it is not operating in XML 1.0 or 1.1. mode, it should write those entities - and if it is simply for backward compatibility. XStream's default parser (Xpp3) will happily ignore the spec and turn those entities back into real values ...
I've committed a version of the PrettyPrintWriter now, that works in different modes:
- QUIRKS: Current behaviour, will write any non-printable character as character entity (default)
- XML_1_0: Throws StreamException if a charater should be written not allowed according the 1.0 XML spec
- XML_1_1: Throws StreamException if a charater should be written not allowed according the 1.1 XML spec
QUIRKS is default for compatibility and because the Xpp3 parser does ignore the spec here and happily read any kind of character entity. You may give the head revision a try.
Closing issues before next release.
I have the same issue with a serialized class.
I've tried all Driver classes provided by the XStream Package, but get in any case the error: Character reference "�" is an invalid XML character
when I try to parse the serialized XML as a DOM4J document.
There are several characters serialized which are non-printable and also not allowed in XML.
Why does XStream creates them instead of excaping them or providing an entitiy declaration?
Any hint how to care with such objects? From my point of view it should be part of XStream to create a wellformed XML structure. But maybe its a general of binary data and XML and how to bring them together.