1. XStream
  2. XSTR-623

XStream generates XML that is not well-formed (according to the XML specification) by writing illegal characters in names


    • Type: Bug Bug
    • Status: Closed Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.3.1
    • Fix Version/s: 1.4
    • Component/s: IO
    • Labels:
    • JDK version and platform:
      Sun 1.6.0_18 Windows XP 32bit


      XStream generates XML that is not well-formed according to the definition of the W3C recommendation [1].

      The problem arises because not all characters are legal in a XML name [2]

      XStream (in particular the ReflectionConverter) generates XML element names using Java identifier names. But not all legal Java identifiers [3] are legal XML element names.

      XStream has already an mechanism to generate "XML friendly names" (XmlFriendlyReplacer), but this replacer only replaces the $ sign.

      Since the JLS is a bit fuzzy about the legal identifiers, I've implemented a small application that test every Unicode character if it is a legal character to start or continue a Java identifier and if so, checks whether it is a legal character to start or continue a legal XML name.

      The result is that there are 85 characters legal in Java identifiers, but not legal in XML names. For most of these characters, it is unlikely that they will be used in Java identifiers on a typical American or European system. But the following characters are legal in Java identifiers and are printable in these systems:
      '$', '¢', '£', '¤', '¥', 'ª', 'µ', 'º'

      But nevertheless for all the other characters it is totally legal to be part of an Java identifier as well.
      Even the UNICODE codepoint LEFT_TO_RIGHT_OVERRIDE (0x202e) is legal! (I tried it with Eclipse Helios on Windows XP. It looks (and feels) weired, but is legal and does not produce compile errors!)

      The solution to this issue would be to enhance the XmlFriendlyReplacer to replace all the 85 characters.

      [1] http://www.w3.org/TR/REC-xml/#dt-wellformed
      [2] http://www.w3.org/TR/REC-xml/#NT-Name
      [3] http://java.sun.com/docs/books/jls/third_edition/html/lexical.html#3.8
      [4] http://www.unicode.org/charts/PDF/U2000.pdf page 195, format characters


        • Assignee:
          Jörg Schaible
          Michael Schnell
        • Votes:
          0 Vote for this issue
          1 Start watching this issue


          • Created: