Summary
So after https://github.com/GoogleChrome/lighthouse/issues/3891#issuecomment-525929813 I went to update our cssstyle dependency to fix the rgba(0,0,0,0) bug, but then it turns out the new version that fixes that introduces several other bugs. The valid color checking has been a whack-a-mole problem for years and it feels not very useful so I was curious what it looks like on real sites.
According to July HTTPArchive data...
Only .08% of sites ever trigger our invalid color logic and of that set ~42% are actually valid colors that we incorrectly flag as invalid (see trix).
Proposal A: Change invalid color check to a warning on PWA themed-omnibox and splash-screen. It feels like this is a warning-level thing anyway and we're almost wrong as often as we're right.
Proposal B: Remove invalid color check entirely. 20KB for something useful to .05% of sites feels like a bit much, false warnings aren't great either.
is removing an audit a "breaking change"?
is removing an audit a "breaking change"?
To clarify, it's not removing an existing audit. It's just changing the behavior of our two existing audits (themed-omnibox and splash-screen) to not fail just because a color was parsed as invalid.
For reference, it was from the Chromium source code for manifest parsing:
where I found that "background_color": "rgba(0,0,0,0)" is valid. Replicating their tests could be a good idea. I'm also not sure if "background_color": "cornflowerblue" would pass either, and that's the sample shown at:
https://developers.google.com/web/tools/lighthouse/audits/manifest-contains-background_color
FYI cornflowerblue should work just fine @clshortfuse, it's a CSS keyword (keywords covered by the test cases at https://chromium.googlesource.com/chromium/src.git/+/62.0.3178.1/content/renderer/manifest/manifest_parser_unittest.cc#1530)
I'm +1 on removing valid color checking. let's do it.
Only .08% of sites ever trigger our invalid color logic
isn't that generally true of most of our PWA audits, though? How many sites have a manifest at all?
My main issue with dropping it is that invalid CSS colors are treated as unset, so they really won't have a theme color set if the color is invalid.
It does look like we were missing a bunch of other functionality in cssstyle specifically for checking colors by trying to just valueType(), so it may help to switch to using that. I did notice there were some CSS 4 color forms missing, though.
What other new bugs came with up updating to 2.0.0?
(less important, but I also don't think "whack-a-mole problem" is really applicable. AFAICR we added color parsing from devtools-frontend way back at the dawn of time, then switched to this library when we pulled out the last bit of WebInspector :)
@patrickhulce I mean if cornflowerblue would pass on Lighthouse. I didn't go too far in my testing, but transparent fails on Lighthouse while it does pass fine on Chrome.
I guess there's a whitelist of some sorts on Lighthouse and it's missing transparent, maybe?
I guess there's a whitelist of some sorts on Lighthouse and it's missing transparent, maybe?
Correct, the version of the library we use does not contain transparent as a keyword. This is the "whack-a-mole" problem I was referring to.
How many sites have a manifest at all?
~6% of sites have an installable manifest. So the upper bound on invalid color applicability to PWA developers is ~.8% (.0008 / .06 * (1 - 0.42)).
What other new bugs came with up updating to 2.0.0?
They don't implement hex color validation correctly so for example our invalid test #1234567 passes as a valid color. They also didn't fix rgba correctly so now it incorrectly flags rgba(128, 64, 128, 0.4) as an invalid color even though rgba(0, 0, 0, 0) passes (they look for 4 integers).
I also don't think "whack-a-mole problem" is really applicable. AFAICR we added color parsing from devtools-frontend way back at the dawn of time, then switched to this library when we pulled out the last bit of WebInspector
I'm probably conflating with our validation of other PWA values that have popped up in iOS, but between the two we've had at least 4 updates to these sorts of validators and still have lots of missing invalid values.
My main issue with dropping it is that invalid CSS colors are treated as unset, so they really won't have a theme color set if the color is invalid.
This is fine but my bigger issue is that when we knowingly have incomplete coverage of a problem, whatever the reason may be, falsely failing for correct code is worse than warning it might have been a mistake especially when our accuracy is nearly as bad as a coin flip.
So I ran the spreadsheet of values through an actual setting of a css color to see which ones were valid, then compared against what the library can do.
If we
trim().toLowerCase() on the inputwe end up with 8 out of 4910 sites in that HTTPArchive data incorrectly flagged as invalid (8 out of 1265 unique color strings in that sheet). A breakdown:
"rgba(239,239,239,0.9"
"rgb(150, 191, 161"
apparently you're allowed to do this and the parser will treat the css function as done (see <EOF-token>)
"rgba(46, 49, 146)"
"rgba(255,238,20)"
"rgba(116,143,138)"
TIL CSS Colors 4 defines rgba() as just an alias for rgb() (both support an optional alpha)
"rgb( 0, 174, 255)"
"rgb( 0, 144, 54)"
"rgb( 91, 186, 213)"
cssstyle's regex incorrectly doesn't allow a space before the first entry
These could all be fixed pretty easily upstream (as well as the need for toLowerCase since "Unless otherwise specified, all matches are ASCII case-insensitive.")
Color strings incorrectly flagged as valid is a slightly bigger set, but we're also less worried about bugging users with these.
We end up with 29 out of 4910 sites in that HTTPArchive data we consider valid when they're actually invalid (16 out of 1265 unique color strings in that sheet).
20 sites (13 unique) use a seven-digit hex value ("#fffffff", "#1717171", "#0000000", "#ff5349f", "#F649941", "#1e87f01", "#659e367", "#f0f0f0f", "#ebe2daf", "#E21836F", "#C51af5b", "#f8f8f8E", "#0f46600"), which isn't valid. This could also be fixed easily upstream.
9 sites (3 unique) use a non-breaking space in their string ("#ffffff 聽", "#8dc63f 聽", "#ffffff聽") which isn't a valid CSS whitespace. This is an artifact of using trim() (which eats nbsp) vs a more correct .replace(/^[ \t\n\r\f]*/, '').replace(/[ \t\n\r\f]*$/, '');, which we could easily use instead, it just made the top-level bullet points look worse :). This is also different in the meta theme-color attribute, since \t etc in an attribute make the color not recognized by Chrome at least.
It's not clear to me if cssstyle wants to deal with starting and ending whitespace (we are reaching into their parsers, so maybe this is dealt with earlier in their pipeline), so this one may be up to us.
I recently submitted a PR to fix the regex used in clean-css: https://github.com/jakubpawlowicz/clean-css/pull/1083/files
I reworked it a bit since then, but here's a Regex that would follow the CSS spec:
^\s*(\/\*([^*]|\*(?!\/))*\*\/)*\s*(currentcolor|transparent|aliceblue|antiquewhite|aqua|aquamarine|azure|beige|bisque|black|blanchedalmond|blue|blueviolet|brown|burlywood|cadetblue|chartreuse|chocolate|coral|cornflowerblue|cornsilk|crimson|cyan|darkblue|darkcyan|darkgoldenrod|darkgray|darkgreen|darkgrey|darkkhaki|darkmagenta|darkolivegreen|darkorange|darkorchid|darkred|darksalmon|darkseagreen|darkslateblue|darkslategray|darkslategrey|darkturquoise|darkviolet|deeppink|deepskyblue|dimgray|dimgrey|dodgerblue|firebrick|floralwhite|forestgreen|fuchsia|gainsboro|ghostwhite|gold|goldenrod|gray|green|greenyellow|grey|honeydew|hotpink|indianred|indigo|ivory|khaki|lavender|lavenderblush|lawngreen|lemonchiffon|lightblue|lightcoral|lightcyan|lightgoldenrodyellow|lightgray|lightgreen|lightgrey|lightpink|lightsalmon|lightseagreen|lightskyblue|lightslategray|lightslategrey|lightsteelblue|lightyellow|lime|limegreen|linen|magenta|maroon|mediumaquamarine|mediumblue|mediumorchid|mediumpurple|mediumseagreen|mediumslateblue|mediumspringgreen|mediumturquoise|mediumvioletred|midnightblue|mintcream|mistyrose|moccasin|navajowhite|navy|oldlace|olive|olivedrab|orange|orangered|orchid|palegoldenrod|palegreen|paleturquoise|palevioletred|papayawhip|peachpuff|peru|pink|plum|powderblue|purple|rebeccapurple|red|rosybrown|royalblue|saddlebrown|salmon|sandybrown|seagreen|seashell|sienna|silver|skyblue|slateblue|slategray|slategrey|snow|springgreen|steelblue|tan|teal|thistle|tomato|turquoise|violet|wheat|white|whitesmoke|yellow|yellowgreen|#[A-F0-9]{3,4}|#[A-F0-9]{6}|#[A-F0-9]{8}|rgba?\((\s*(\/\*([^*]|\*(?!\/))*\*\/)*\s*[+-]?((\d+)|(\d*\.\d+))(e[+-]?\d+)?\s*(\/\*([^*]|\*(?!\/))*\*\/)*\s*,){2}\s*(\/\*([^*]|\*(?!\/))*\*\/)*\s*[+-]?((\d+)|(\d*\.\d+))(e[+-]?\d+)?\s*(\/\*([^*]|\*(?!\/))*\*\/)*\s*(,\s*(\/\*([^*]|\*(?!\/))*\*\/)*\s*[+-]?((\d+)|(\d*\.\d+))(e[+-]?\d+)?%?\s*(\/\*([^*]|\*(?!\/))*\*\/)*\s*)?(\)|$)|rgba?\((\s*(\/\*([^*]|\*(?!\/))*\*\/)*\s*[+-]?((\d+)|(\d*\.\d+))(e[+-]?\d+)?%\s*(\/\*([^*]|\*(?!\/))*\*\/)*\s*,){2}\s*(\/\*([^*]|\*(?!\/))*\*\/)*\s*[+-]?((\d+)|(\d*\.\d+))(e[+-]?\d+)?%\s*(\/\*([^*]|\*(?!\/))*\*\/)*\s*(,\s*(\/\*([^*]|\*(?!\/))*\*\/)*\s*[+-]?((\d+)|(\d*\.\d+))(e[+-]?\d+)?%?\s*(\/\*([^*]|\*(?!\/))*\*\/)*\s*)?(\)|$)|rgba?\((\s*(\/\*([^*]|\*(?!\/))*\*\/)*\s*[+-]?((\d+)|(\d*\.\d+))(e[+-]?\d+)?\s*(\/\*([^*]|\*(?!\/))*\*\/)*\s*\s+){2}\s*(\/\*([^*]|\*(?!\/))*\*\/)*\s*[+-]?((\d+)|(\d*\.\d+))(e[+-]?\d+)?\s*(\/\*([^*]|\*(?!\/))*\*\/)*\s*(\/\s*(\/\*([^*]|\*(?!\/))*\*\/)*\s*[+-]?((\d+)|(\d*\.\d+))(e[+-]?\d+)?%?\s*(\/\*([^*]|\*(?!\/))*\*\/)*\s*)?(\)|$)|rgba?\((\s*(\/\*([^*]|\*(?!\/))*\*\/)*\s*[+-]?((\d+)|(\d*\.\d+))(e[+-]?\d+)?%\s*(\/\*([^*]|\*(?!\/))*\*\/)*\s*\s+){2}\s*(\/\*([^*]|\*(?!\/))*\*\/)*\s*[+-]?((\d+)|(\d*\.\d+))(e[+-]?\d+)?%\s*(\/\*([^*]|\*(?!\/))*\*\/)*\s*(\/\s*(\/\*([^*]|\*(?!\/))*\*\/)*\s*[+-]?((\d+)|(\d*\.\d+))(e[+-]?\d+)?%?\s*(\/\*([^*]|\*(?!\/))*\*\/)*\s*)?(\)|$)|hsla?\(\s*(\/\*([^*]|\*(?!\/))*\*\/)*\s*[+-]?((\d+)|(\d*\.\d+))(e[+-]?\d+)?(deg|grad|rad|turn)?\s*(\/\*([^*]|\*(?!\/))*\*\/)*\s*(,\s*(\/\*([^*]|\*(?!\/))*\*\/)*\s*[+-]?((\d+)|(\d*\.\d+))(e[+-]?\d+)?%\s*(\/\*([^*]|\*(?!\/))*\*\/)*\s*){2}(,\s*(\/\*([^*]|\*(?!\/))*\*\/)*\s*[+-]?((\d+)|(\d*\.\d+))(e[+-]?\d+)?%?\s*(\/\*([^*]|\*(?!\/))*\*\/)*\s*)?(\)|$)|(hsla?|hwb)\(\s*(\/\*([^*]|\*(?!\/))*\*\/)*\s*[+-]?((\d+)|(\d*\.\d+))(e[+-]?\d+)?(deg|grad|rad|turn)?\s*(\/\*([^*]|\*(?!\/))*\*\/)*\s*(\s+\s*(\/\*([^*]|\*(?!\/))*\*\/)*\s*[+-]?((\d+)|(\d*\.\d+))(e[+-]?\d+)?%\s*(\/\*([^*]|\*(?!\/))*\*\/)*\s*){2}(\/\s*(\/\*([^*]|\*(?!\/))*\*\/)*\s*[+-]?((\d+)|(\d*\.\d+))(e[+-]?\d+)?%?\s*(\/\*([^*]|\*(?!\/))*\*\/)*\s*)?(\)|$)|device-cmyk\((\s*(\/\*([^*]|\*(?!\/))*\*\/)*\s*[+-]?((\d+)|(\d*\.\d+))(e[+-]?\d+)?%?\s*(\/\*([^*]|\*(?!\/))*\*\/)*\s*\s+){3}\s*(\/\*([^*]|\*(?!\/))*\*\/)*\s*[+-]?((\d+)|(\d*\.\d+))(e[+-]?\d+)?%?\s*(\/\*([^*]|\*(?!\/))*\*\/)*\s*(\/\s*(\/\*([^*]|\*(?!\/))*\*\/)*\s*[+-]?((\d+)|(\d*\.\d+))(e[+-]?\d+)?%?\s*(\/\*([^*]|\*(?!\/))*\*\/)*\s*)?(,([^()]*|\([^)]*\))*)?(\)|$)|lab\((\s*(\/\*([^*]|\*(?!\/))*\*\/)*\s*[+-]?((\d+)|(\d*\.\d+))(e[+-]?\d+)?\s*(\/\*([^*]|\*(?!\/))*\*\/)*\s*\s+){2}\s*(\/\*([^*]|\*(?!\/))*\*\/)*\s*[+-]?((\d+)|(\d*\.\d+))(e[+-]?\d+)?\s*(\/\*([^*]|\*(?!\/))*\*\/)*\s*(\/\s*(\/\*([^*]|\*(?!\/))*\*\/)*\s*[+-]?((\d+)|(\d*\.\d+))(e[+-]?\d+)?%?\s*(\/\*([^*]|\*(?!\/))*\*\/)*\s*)?(\)|$)|lch\((\s*(\/\*([^*]|\*(?!\/))*\*\/)*\s*[+-]?((\d+)|(\d*\.\d+))(e[+-]?\d+)?\s*(\/\*([^*]|\*(?!\/))*\*\/)*\s*\s+){2}\s*(\/\*([^*]|\*(?!\/))*\*\/)*\s*[+-]?((\d+)|(\d*\.\d+))(e[+-]?\d+)?(deg|grad|rad|turn)?\s*(\/\*([^*]|\*(?!\/))*\*\/)*\s*(\/\s*(\/\*([^*]|\*(?!\/))*\*\/)*\s*[+-]?((\d+)|(\d*\.\d+))(e[+-]?\d+)?%?\s*(\/\*([^*]|\*(?!\/))*\*\/)*\s*)?(\)|$)|gray\(\s*(\/\*([^*]|\*(?!\/))*\*\/)*\s*[+-]?((\d+)|(\d*\.\d+))(e[+-]?\d+)?\s*(\/\*([^*]|\*(?!\/))*\*\/)*\s*\s+(\/\s*(\/\*([^*]|\*(?!\/))*\*\/)*\s*[+-]?((\d+)|(\d*\.\d+))(e[+-]?\d+)?%?\s*(\/\*([^*]|\*(?!\/))*\*\/)*\s*)?(\)|$)|color\(([^()]*|\([^)]*\))*(\)|$))\s*(\/\*([^*]|\*(?!\/))*\*\/)*\s*$
It can look really confusing, but if follow the code it's pretty logical. You can see the gist for it here: https://gist.github.com/clshortfuse/acffca0a5f1412edcf9e132f33febe1c
As for the point of the EOF parsing, ECMAScript doesn't support \z or \Z (different languages use different terms), so, it's just using $.
There are some things to note:
<alpha-value> isn't bound to [0-1] or [0% - 100%]rgba is now alias for rgb so it can take 3 values (same with hsla/hsl)rgb can take 4 values (same with hsl)/[\n \t]/, but I'm using /\s/ to support both carriage return (\r) and form feed (\f).I'm not sure about too sure about the performance though. It might faster to trim it first, and convert to lowercase and checking against the named colors first.
Edit: Forgot about comments, so I had to add it in. I added in device-cmyk(), hwb(), lch(), lab(), gray(), and color(). Tried hand optimizing a bit as well. color() seems is probably too complicated to implement with Regex, and device-cmyk() needs recursion for its fallback value.
@clshortfuse if we still go this route the changes would be made upstream in cssstyle, where the different CSS color functions are split into separate regexes and a named color check, yes.
To be clear, I was never attempting to claim it's impossible to fix all the issues in cssstyle :) I'm arguing that the large gap is evidence of the fact that we'd essentially have to fill this gap ourselves and maintain this library for very little LH user value.
To quote a great mind,
I personally don't think omnibox colors are that important to a PWA experience
^^ this. On top of the tiny percentage of users this affects makes this just not seem worth it.
I wrote the crazy regex, and I kinda agree. It's going to be a cat and mouse game with standards. Validating the content of the color properties is really more of a courtesy than necessity.
Also, the Manifest spec itself is still in a working draft state. And the draft even lists that color processing steps are incorrect with a reference to this GitHub issue: https://github.com/w3c/manifest/issues/760 So realistically speaking, the process we implement now may change again later. There's also no direction as to how to handle EOF or non-terminated functions ).
On another note, ensuring a string is parse-able (regex), parsing, and validating are all different things. Having to validate or parse the color is a bit more work than just ensuring it can be parsed. I also have no idea what to do with env(). In theory, any user-agent can support something like env(system-tint-color). For example, Android now has a "tint" color on the system theme that perhaps could be added as a user-agent environment variable in the future. But we have no way to be able to parse that. The objective, should be to ensure it can be parsed, not parse it, if done at all.
And if we're referencing an external library that's purpose is to validate (and not just parse), cssstyle has no support for device-cmyk(). Nor does it support color() which has complexities considering there are multiple color spaces. So, even today, it would be incomplete. And taking a quick glance in master that cssstyle is missing the ability to parse 4 parameter rgb(), doesn't support commaless functions, and is too strict on its angle parsing (the unit is not required). So it's not just a clean upgrade either. And we'd be waiting for them to not only build a validator, but also the parser.
Edit: My terminology was mixed up with parsing/validating. Ensuring "parse-able" would regex check on the string. Reading the values into Objects would be "parsing". Converting those Objects into other formats would be related to validation.
We could close the gap between implementation and our color validation by not writing our own color validation (or rather, relying on cssstlye to be correct).
when the color is invalid, blink pushes an error to the parser: https://cs.chromium.org/chromium/src/third_party/blink/renderer/modules/manifest/manifest_parser.cc?l=164&rcl=65b3e11ce92ca41af626cb68ec84b795c90453ad
These end up as console errors: https://cs.chromium.org/chromium/src/third_party/blink/renderer/modules/manifest/manifest_manager.cc?l=194&rcl=ab4ecdb6a2dacf0a35344c2baa9509a9659eec86
I think that's a great approach @connorjclark, do we know if we need to listen on a specific target for those? I'm not seeing those messages in our existing errors-in-console audit results for sites that currently fail on the invalid color currently in HTTPArchive :/
that doesn't check meta theme-color, though, which is what #3891 wants to prioritize :)
that doesn't check meta theme-color, though, which is what #3891 wants to prioritize :)
But conceptually we could extend it to the exact same thing if that became the priority of the PWA guidelines, no? As in, Chrome would then have a vested interest in validating the meta tag not just the manifest value.
The only two places we use cssstyle today are for validating a color. My position is still that it's not worth bringing in and keeping up with this dependency for the little value most sites see from it.
We decided to pick up discussion next time though :)
My position is still that it's not worth bringing in and keeping up with this dependency for the little value most sites see from it.
I'm more agnostic today about how we validate colors, but I do think some of the PWA audits suffer from a recency bias in these type of conversations.
In a world where we're happy to pull in jsonld and hundreds of KiB of schemas, Lighthouse being able to correctly validate a CSS color (through whatever method) is a pretty low bar :)
While I cannot deny the applicability argument for your jsonld point (which I almost conclude with you on), the fact that "is this color valid" is such a low bar and easier to check without Lighthouse is also basically how I can rationalize the existence of the schema validation. The structured data validator was prescribed as the only remaining way to know if your structured data was nonsense. Still belongs as a plugin? Yeah probably, but the high complexity justifies its existence as a thing.
We've decided to move forward with dropping the color checking here.