Html: Remove parse error for dashes in comments <!-- -- -->

Created on 31 May 2016  Â·  14Comments  Â·  Source: whatwg/html

It has been 10 years since SGML-style comments were killed.
http://ln.hixie.ch/?start=1137799947&count=1

Reasons it's a parse error:
https://lists.w3.org/Archives/Public/public-whatwg-archive/2006May/0038.html

This has two benefits:
1) Previously non-conforming syntax won't be made conforming, so pedantic HTML 4 implementations won't break with conforming HTML5 comments.
2) Conforming HTML5 documents can be serialized as conforming XHTML5 document without data loss even when preserving comments. (Strictly speaking in XML processing it is legitimate to drop comments in the bit bucket, so in that sense, dropping comments would not be _significant_ data loss. :-)

(Plus, it was helpful for authors to get a hint from the validator if their page was broken in legacy Firefox that implemented SGML-style comments.)

I think these reasons are no longer relevant. The XML processing case is adequately covered by https://html.spec.whatwg.org/multipage/syntax.html#coercing-an-html-dom-into-an-infoset and nobody cares about what was valid HTML4, and nobody cares about pre-2006 Firefoxes.

So I suggest we make this be allowed:

<!--------------------------------->

cc @hsivonen @sideshowbarker

document conformance parser

All 14 comments

Yes please let’s drop this requirement. We have plenty of evidence that it causes problems for authors without actually solving any problems for them:

(BTW I dunno what BEM is but I assume it must be some library or template thing or something that uses -- in comments.)

Currently --! in a comment is a parse error, but I don't see any reason for that to be the case:

<!-- --! -->

(But <!-- --!> should still be a parse error because it's a bogus way to close a comment.)

We need to introduce something new to make sure that "nesting" of comments is still caught by conformance checkers, so that this is still be a parse error:

<!-- <!-- foo --> -->

We have plenty of evidence that it causes problems for authors without actually solving any problems for them

This is persuasive.

But...

We need to introduce something new to make sure that "nesting" of comments is still caught by conformance checkers

This is persuasive in the other way even if one didn't care about XHTML serializability.

I'd prefer us not tweaking this stuff anymore.

Sorry, but is there any particular reason not allowing -- in comments is problematic? I can see it barely causing any trouble. I’m not sure that four tweets are a good enough reason to make this change.

https://checker.html5.org/ now experimental support conforming to the spec text in the PR #1356 branch.

Also, sorry if I misunderstand the situation, but, wouldn’t this, besides deviating HTML from XML, also deviate WHATWG’s HTML from W3C’s HTML? I strongly oppose against this change.

Also about the https://checker.html5.org/ support as https://github.com/whatwg/html/pull/1356#issuecomment-225433921 notes, the new messages emitted:

Saw <!-- within a comment. Probable cause: Nested comment (not allowed).

Error: --!> found at end of comment (should just be -->).

Note also that you can use https://checker.html5.org/parsetree/ to view resulting parse tree.

Hello @Zambonifofex. I don't see any new information in your comments. You are free to think that this is not an important problem to solve, though @sideshowbarker who currently maintains checker.html5.org seems to think otherwise. If you have any reasons for why this is a bad change then now is a good time to put them forward. I'm certainly open to change my mind here if there is new information we have not yet considered.

Any change we make is going to be a change compared to W3C HTML. That is not in itself an argument against the change.

Hello @zcorpan, thanks for replying. I don’t think I have anything _new_ to share. I just don’t like this as a change. I don’t think it’s actually accomplishing anything useful. The only thing I see this change making is deviating your HTML spec from W3C’s. And even if you say that “that is not in itself an argument against the change”, it’s definitely not a good thing; I think it’s going to lead to more confusion than it’s going to fix.

It also adds a bunch of unnecessary exceptions (“irregularities”) to the standard. Such as disallowing --!> and <!-- inside of comments for no apparent good reason. I think this is going to cause more trouble than it is going to solve, as there are comments that are valid today and won’t be valid with this change.

The way things are currently neatly avoids cases such as <!-- <!-- --> --> and <!-- --!> without any inconsistencies.

I’ll also have to agree with @hsivonen. This is also going to further deviate HTML’s syntax from XML for no particularly good reason.

Making more things that can't be serialized in XHTML conforming in HTML makes me sad. I do realize that serializability to XHTML is of interest to fewer people than not getting errors for consecutive dashes in comments.

Still, I wish this change wasn't made and in general I wish we didn't tweak stuff like this anymore.

Besides, of course, the historical reason things are the way they are. If not for any of the other reasons, I think it’s neat to preserve this prohibition for its historical value, especially since it seems to me that it’s actually bringing more good than bad.

I just don’t see anything good that this change is going to bring. To me, it seems like such a trivial thing to be bothered about: not being able to have consecutive dashes in comments.

I don’t think it’s actually accomplishing anything useful.

What it’s accomplishing is not making something an error that there is no inherent good reason to treat as an error in HTML because for the vast majority of HTML authors it causes absolutely no problems—and because most authors intuitively do not consider it an error (and would not be able to explain why it should be an error), and very very commonly put multiple lines like the following throughout their source:

<!--------------------------------------------------------------------------------->

…in order to clearly visually separate sections of the HTML source from one another for easier reading.

When we penalize every author who does that (e.g., when every HTML checker emits an error about it), it repeatedly wastes the time of many many authors whose attention would better be spent on noticing and fixing genuine bad mistakes in their source that cause actual real problems.

It also adds a bunch of unnecessary exceptions (“irregularities”) to the standard. Such as disallowing --!> and <!-- inside of comments for no apparent good reason.

Those are already disallowed by the current spec text, because it disallows -- completely, including in <!-- within a comment, and because it disallows --! completely (not just at the end of a comment).

In terms of behavior, adding new parse errors for those cases helps to enable tools (like the HTML checker) to provide more-specific useful guidance to users in those cases—to tell users:

You have <!-- in a comment; probably you accidentally nested one comment inside another.”

Comment incorrectly ends in --!>; comments start with <!-- but must end in only --> ( no !).

The other common case where this is a problem for people is that they use -- in their IDs or class names, and want to be able to put their IDs or class names inside HTML comments. There is a CSS naming scheme called "BEM" that recommends putting -- in class names. I think it is more useful to be able to do that than have to somehow "escape" the double-dash, or stop using the checker since it's complaining about non-problems.

You can think it's a trivial thing, but many trivial things that are useless errors in a checker makes the checker itself useless, in my opinion.

BTW, it is pre-2011 Firefoxes, not pre-2006. More precisely, Firefox 3.6 and older.

Thanks. I thought it was changed in 2006 but https://bugzilla.mozilla.org/show_bug.cgi?id=214476 shows it was not.

Was this page helpful?
0 / 5 - 0 ratings