Lighthouse: Work with "application/xhtml+xml" documents

Created on 28 Sep 2020  路  7Comments  路  Source: GoogleChrome/lighthouse

Feature request summary

When running Lighthouse, ideally it would still work with the application/xhtml+xml mime type.

https://github.com/GoogleChrome/lighthouse/blob/7d685217df0083bc69ffd2e8360ea7a75dd8dbb9/lighthouse-core/gather/gather-runner.js#L226

xhtml

What is the motivation or use case for changing this?

It should be able to continue as a normal HTML document, rather than assuming all checks have failed.

I'm not expecting it to do any additional checks (the browser would have already ensured it's valid XML), but I'd still like it to work on my development server (I disable XML mode on Demo and Live just incase the website issues some HTML which isn't valid XML).

How is this beneficial to Lighthouse?

Lighthouse should be able to check a HTML document that's been served in XML mode - it's still a perfectly good HTML document, just authored to a stricter set of rules (e.g. correct nesting of tags, all attributes have been quoted, no double use of attributes on a tag).

P1.5 bug

Most helpful comment

Understood, I agree a completely fatal run for XHTML was a bit harsh compared to say a toplevel warning to indicate non-support instead.

We can let this issue track that effort. It seems like you already started some commits working on this would you be interested in a PR that does the warning conversion? :)

All 7 comments

Thanks for filing @craigfrancis!

Lighthouse advice is specifically targeted to HTML. Auditing an XHTML document yields several errors across Lighthouse as well as some incorrect advice, which we definitely wouldn't want to spread. We've explicitly decided to limit Lighthouse to HTML. Refer to https://github.com/GoogleChrome/lighthouse/issues/10206 and https://github.com/GoogleChrome/lighthouse/issues/9245 for more discussion and reasoning.

10206 seems a little odd, as the HTML5 spec includes XML mode, and HTML5 specifically says "A DOCTYPE is a required preamble", so the Lighthouse advice to include it is still correct (and I do). Just because the Mozilla website say's "you do not need a DOCTYPE to enable standards mode", that's not relevant.

9245 is talking about this warning, which should exist for generic application/xml, or other document types... but Lighthouse should, and used to, run the normal HTML checks on application/xhtml+xml documents.

I've been happily using Lighthouse for a while, and now I can't.


As an aside, XML mode, while probably too difficult for most websites (most third party code causes issues), it does provide a really good security guarantee by ensuring all attributes are quoted (and if not, the page fails to render).. for example, take this bit of HTML + PHP:

<img src=<?= htmlentities($url) ?> alt="Example" />

First glance it seems to be fine, but because the quotes have been forgotten, $url could be set to / onerror=alert(1) and you have an easy XSS vulnerability.

Understood, I agree a completely fatal run for XHTML was a bit harsh compared to say a toplevel warning to indicate non-support instead.

We can let this issue track that effort. It seems like you already started some commits working on this would you be interested in a PR that does the warning conversion? :)

@patrickhulce pointed out that https://github.com/GoogleChrome/lighthouse/pull/11042#issuecomment-652637567 lists application/xhtml+xml as the third most popular mime type in the June HTTP Archive run (after text/html and the empty string). It helpfully gives http://apple-store.in/ as an example serving with that mime type (and it is, but it doesn't validate :).

The report for it looks pretty good and doesn't have any of the problems of the pure xml document cited in #9245, so supporting this case SGTM too.

@patrickhulce, thanks... and the commit I was trying (somehow managed to create 3 with basically the same content) was just a quick experiment to see if I could allow application/xhtml+xml.

Personally I don't know Lighthouse well enough to create a relevant warning (UI changes), especially as application/xml probably should keep this warning (a fairly different beast, with different recommendations), whereas I've never had any problems with Lighthouse checking an application/xhtml+xml document authored to the HTML5 spec (my only problem with Lighthouse has been a CSP blocking robots.txt).

Got it @craigfrancis, adding a warning isn't too difficult (doesn't require any UI work, see below for example for redirect warning) since we just need to push an i18n string to an array from where your commit already was, but we can find someone else to wrap this up too if you'd prefer no worries :)

https://github.com/GoogleChrome/lighthouse/blob/259cafd3c377b3fadf3821ce2fc3a9b37e9eb18b/lighthouse-core/gather/gather-runner.js#L580-L583

I'm not in a position to test this at the moment (should be getting back to normal work), but maybe something like:

https://github.com/GoogleChrome/lighthouse/compare/master...craigfrancis:accept-xhtml

I suspect it would be better if someone else can finish this, as it does need testing, and probably needs a different message (with translations), instead of "The page provided is not HTML (served as MIME type application/xhtml+xml)"

That said, I'm not convinced a warning is needed.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

shellscape picture shellscape  路  3Comments

nl-igor picture nl-igor  路  3Comments

dkajtoch picture dkajtoch  路  3Comments

wizardnet972 picture wizardnet972  路  3Comments

radum picture radum  路  3Comments