If I generate XHTML (as opposed to HTML), all browsers experience an XML error and stop. The reason: the HTML file contains HTML entities, like , which are kept in the XHTML verbatim. Browsers do not understand that, they need the numerical equivalent (in this case  ).
Um... Ivan... serious question: isn't xhtml deprecated? Or is this for epub (which I think also now supports plain old HTML)?
Well... I do not know whether xhtml is deprecated or not, but I don't think so. Looking at 搂1.6 of the latest spec suggests otherwise. (I agree, though, that most, if not almost all, people producing Web pages these days use HTML.)
Yes, at present, it is important for EPUB. At this moment, the official EPUB3 relies on XHTML, per Core Media Types and also EPUB Content Documents 3.2. What happens is that many reading systems do accept HTML, too, but that means HTML is tolerated. I have not checked, but I would expect that the EPUB checker software, that is used by most publishers extensively, rejects HML or at least issues a warning (@dauwhe, is that correct?). I would hope that this would change in an upcoming release, but that is where we are now.
I can't reproduce this; first, the suggested affected spec does not include and second, the exporter correctly replaces entities into the actual characters (e.g. into 聽 and © into 漏).
@saschanaz, thanks for checking. Here is what I realized:
is indeed not in the original respec code. Respec adds a non-breaking space for each reference (which is great!) entity and not the unicode character.Ie, my original assumption for the bug was wrong indeed; the problem seems to be that something does not work as expected on Safari...
(I use the latest released version of Safari)
That sounds like an XmlSerializer implementation bug on Safari. I don't have a Mac so it's a bit limited to test it, could you file an issue on WebKit side?
Hm. Is it 'just' am XMLSerializer bug or something in the context? I must admit I am not familiar with that level of APIs, so it is a bit awkward for me to raise a bug whose details I do not really understand...
In that case I can do it myself. Could you confirm that this minimal repro shows on Safari? It does on my Epiphany but not on Firefox nor Chrome, so it should be definitely something about XMLSerializer.
Confirmed, it's a Safari bug.
@saschanaz can you help me write the bug report for Webkit? I'm happy to file it. Does this sound ok?
Steps to reproduce:
Expected:
Serializing the should result in a nbsp in the output. See Chrome and Firefox, which replace the for the correct code point.
Actual:
The serializer spits out:
<p xmlns="http://www.w3.org/1999/xhtml" id="nbsp"> </p>
I'm a bit unsure if the above is correct... as when I check Firefox it is correct:

But chrome outputs:

(we will probably need to consult the HTML spec to see what is supposed to happen to entities upon XML serialization... @travisleithead, as editor of the serialization spec, maybe you can save us a bit of time? Is Chrome/Safari right? or is Firefox right?)
Well, the spec aligns with Firefox/Chrome behavior.
https://w3c.github.io/DOM-Parsing/#xml-serializing-a-text-node
Since is just a Text node in the DOM, it is serialized as a space since there's no need to entity-encode anything there.
Having said that, a lot of this spec was fiction, and there's work to be done (by someone at some point) to try to align it with implementations :) But to me, Firefox/Chrome's behavior makes the most sense since it doesn't require a special case. (E.g., I assume not all spaces get translated to when serialized.)
@marcoscaceres That's just the DOM inspector. Try $0.textContent on your console and it will show "<p xmlns="http://www.w3.org/1999/xhtml" id="nbsp">聽</p>" as expected.
Edit: I mean your draft looks good 馃憤
ah! thanks for checking/explaining that @saschanaz. You are the best! Ok, so WebKit bug it is.
Ok, filed:
https://bugs.webkit.org/show_bug.cgi?id=207976
Closing, as it's not something we can fix here :) XHTML lives to fight another day!
Thanks also @travisleithead for the explanation and helping understand the state of that spec. Hopefully we can get around to updating it!
@marcoscaceres @saschanaz @travisleithead thanks for carrying this through. @saschanaz I only saw your note when starting my day a few minutes ago (the joys of cooperating over diverse time zones 馃槃
Most helpful comment
@marcoscaceres That's just the DOM inspector. Try
$0.textContenton your console and it will show"<p xmlns="http://www.w3.org/1999/xhtml" id="nbsp">聽</p>"as expected.Edit: I mean your draft looks good 馃憤