Html: Factor out sections dealing with general Web principles, Browsing the Web, and non-HTML media types

Created on 12 Jun 2020 · 5Comments · Source: whatwg/html

HTML has many sections not strictly relevant to media types, but instead that proscribe behavior for Web browsers to follow, regardless of if HTML is supported, or if the document is HTML at all. Orthogonal concerns such as these make it difficult to reference HTML specifications into other standards, for uses that are not Web browsers.

RFC 6838 describes the purpose of media types on the Internet; and usage within the Web is elaborated in Architecture of the World Wide Web, Volume One; they suggest that a given media type (here, text/html and application/xhtml+xml) should have a single specification that can define how to handle documents, including across different version numbers.

While this specification indeed incorporates handling of older versions; is difficult to work with in Web applications, because it reads more like "The Web Browser Specification" rather than a specification defining how a Web media type works.

For example, Browsing the Web, Processing Link Headers (see #4224), and Structured Data are sections not relevant to parsing HTML, and are used for processing non-HTML media types, yet somehow they have found a home in this specification.

The consequence is if I want to write a user agent that renders a PDF or SVG document; or if I want to implement IndexedDB on the server side; doing so involves referring to this specification. This seems excessive.

These sections (and other similar ones) should be factored out into a common, Web document processing model.

Source

awwright

❤1

Most helpful comment

Parts of HTML have been split of historically. The way that has worked is with someone trying to write an independent specification that duplicates some part of HTML and demonstrates value to the point it makes more sense for HTML to reference that specification. Many specifications in https://spec.whatwg.org/ were originally in HTML (or parts of them were). (It might happen again with the WebSocket API soonish.)

So if you want to do something like this, that would probably be the way you're most likely to see success.

annevk on 12 Jun 2020

👍3

All 5 comments

Sorry, we're not interested in doing this.

domenic on 12 Jun 2020

👍3

@domenic Who's "we"? The guidelines for contributions suggests I should be able to get a more diverse body of feedback, first.

awwright on 12 Jun 2020

So if you want to do something like this, that would probably be the way you're most likely to see success.

annevk on 12 Jun 2020

👍3

This diagram seems relevant.

bathos on 12 Jun 2020

I can think of two cases. 1.
WPACK effort suffers because there isn't a clear separation of protocol and content and user experience.

Is PDF part of "the web"? Every browser seems to do PDF but the interaction model is tied to HTML. The only spec that covers pageload is in the description of the treatment of fragment identifiers in the application/pdf MIME type in RFC8118