HTML standard requires that almost every element has to be closed like </head> , </body> , </style> etc. although which element is ended is obvious.
What I propose is that rather than using the closing words just </> can be used as below:
<html>
<head>
<style>body{background-color:yellow;}</>
</>
<body>
<h1>heading</>
<p>text <b>bold text</> text</>
</>
</>
Benefits of this method are as follows:
HTML codes will be,
<head> and <body> do not need to be closed.
<head>and<body>do not need to be closed.
I think you did not understand the main idea. Do not go deep into details, just catch the main idea.
Furthermore I think that it should be made necessary to close every element like in XHTML because everyone cannot memorize what must and what may not be closed.
Unfortunately, such a change to the HTML parser would not be backwards compatible and might also have security implications.
Unfortunately, such a change to the HTML parser would not be backwards compatible and might also have security implications.
I cannot imagine a security problem if closing tags can be shortened this way. Can you give me an example of such a "security implication"?
Say you have an application that uses one of the various popular HTML “sanitization” libraries to transform user-input HTML content. These are usually whitelist based. Although generally such a whitelist omits both <script> and <style>, I’m certain some applications disallow the former and not the latter. Assuming that’s the case here, consider this user input:
<style>
x::before {
content: "</>";
}
y::before {
content: "<script>alert('womp womp');</script>";
}
</style>
Currently, this will be parsed as one style element with one text child. Even if one is additionally parsing and reserializing the stylesheet text — maybe stripping any invalid selectors or unrecognized properties — the attack would end up getting preserved; it’s presently legit CSS. In agents where </> could end the <style> element’s content, though, the text from the second to third quotation mark is chardata, and <script> begins a script element. Existing sanitiziers would fail. HTML can’t introduce a feature knowing that it can create new XSS opportunities for any application that doesn’t update to match the change (including “resanitizing” any existing content in storage).
To be fair, HTML’s parsing of the content of <script>, <style>, and a few other elements is special and possibly one could say these cases remain as they are and can never use </>. That probably would shut down many problems, but ...
This belongs to a class of issues that tend to arise any time one of these things is true:
In HTML, there is no fatal input at all, so the surface for syntactic extension is very limited, pretty much absent (new features build on existing syntax, so that older agents still produce the same document structure if new elements or attributes are introduced). HTML also has a fragment parse goal, and fragments are often treated as portable (as in the example above), so one would need a reliable way to communicate the intended interpretive mode within the source text even if it is not a document, e.g. a magic comment or something that builds on the “bad comment” productions and looks like a processing instruction or doctype declaration.
Although generally such a whitelist includes both
Most helpful comment
If existing pages contain the text
</>, which currently gets thrown away by the parser as a parsing error, then they will change behavior after this change.That is, if you today type something like
<div>foo</>bar</div>, you'll get adivcontaining the text "foobar". After this feature is added, you'll instead get adivcontaining "foo", and "bar" as a text-node sibling after it, as if you'd typed<div>foo</div>bar</div>. That the backwards-incompatibility change.Seem unlikely? Perhaps, but there are many trillions of HTML pages out there with all manner of badly-authored code, and we can't predict ahead of time how likely it is that this sort of content exists. We'd have to test, which takes some time and effort, and we only want to do that if the improvement is worth the cost.
Further, as stated, there are also lots of sanitization libraries that might (correctly, today) not sanitize that sort of tag from the content, and after this change would make pages vulnerable to markup injection. Unlike the last point, we can't easily measure this, and it's potentially more serious of an issue, as well.
Separately, I'll also note that this clashes with another existing HTML feature, the ability to omit end tags from many elements. Without a tagname, the end tag becomes ambiguous in these situations as to which element should be closed. For example:
This is valid today, and produces the obvious markup structure - a
divcontaining ap. If you switch to this more concise endtag, tho:Do you have the same structure, meaning that the markup is now less obvious and you have to memorize the end-tag omission rules to know when it's allowed to use
</>? Or does the first</>close thep, leaving the firstdivunclosed, so the seconddivis now its child rather than its sibling? There was no question before, even if you wrote the code weirdly; you'd just see a</div>and know it was closing thediv, not thep.