Section | Chapter | Authors | Reviewers
-- | -- | -- | --
II. User Experience | 8. Security | @arturjanc @ScottHelme | @paulcalvano @bazzadp @ghedo @ndrnmnn
Due date: To help us stay on schedule, please complete the action items in this issue by June 3.
To do:
Current list of metrics:
TLS 🔒
Security Headers 📋
Cookies 🍪
Other ❓
👉AI (coauthors): Assign peer reviewers. These are trusted experts who can support you when brainstorming metrics, interpreting results, and writing the report. Ideally this chapter will have 2 or more reviewers who can promote a diversity of perspectives.
👉 AI (coauthors): Finalize which metrics you might like to include in an annual "state of web security" report powered by HTTP Archive. Community contributors have initially sketched out a few ideas to get the ball rolling, but it's up to you, the subject matter experts, to know exactly which metrics we should be looking at. You can use the brainstorming doc to explore ideas.
The metrics should paint a holistic, data-driven picture of the web security landscape. The HTTP Archive does have its limitations and blind spots, so if there are metrics out of scope it's still good to identify them now during the brainstorming phase. We can make a note of them in the final report so readers understand why they're not discussed and the HTTP Archive team can make an effort to improve our telemetry for next year's Almanac.
Next steps: Over the next couple of months analysts will write the queries and generate the results, then hand everything off to you to write up your interpretation of the data.
Additional resources:
Happy to help review this section if you want, as looks like you are light on reviewers?
That'd be great, thanks @bazzadp!
Other metrics that could be interesting (though I don't know how feasible it would be to collect them):
I'm also available if more reviewers are needed.
Thanks @ghedo! I've added you as a reviewer and sent you a team invite.
Those are all great stats to know @ghedo but I would caution the authors to be careful to not make it just about SSL/TLS. That’s very important obviously but we all know there is more to security than just that. Other resources (e.g. https://www.ssllabs.com/ssl-pulse/) measure nitty gritty details of SSL/TLS usage well for those of us really interested in the deep down detail of this.
In another post (https://github.com/HTTPArchive/almanac.httparchive.org/issues/1) @rviscomi suggested about 10 metrics per chapter and I think if more than 3-4 of those for this chapter were about SSL/TLS in then we could be at risk of concentrating on that too much and missing out on other interesting analysis.
IMHO we need to think about finding the stats that will be the most useful to the wider community and not just the security community for this report. And that might mean picking stats that represent the rough state of security (e.g. TLS version) rather than something more specific (specific cipher suites). Finding this balance of representative detail versus too much detail is something I’m also struggling with for my chapter on HTTP/2 (https://github.com/HTTPArchive/almanac.httparchive.org/issues/22) btw.
Anyway just my two cents, and don’t want to put people off suggesting metrics (it’s easier to whittle down a big list than to stretch up a little list!) but something I’m giving a bit of thought to for my chapter so thought I’d mention here too for consideration.
Oh and other point (and counter argument to my points above!) is some stats will probably throw out a few surprises, so we should also be careful not to limit ourselves too much either on stats because of preconceived ideas of what they will show. Can always exclude stats in final report if doesn’t show anything interesting.
@bazzadp I agree with you, and indeed I tried to list more general metrics (e.g. "sites that support modern/legacy ciphers"), rather than more specific ones (e.g. "distribution of specific ciphers"), though there's probably margin for improvement. Happy to discuss this more to try and get the list more focused (it's also likely that some of the metrics I proposed can't be easily collected anyway).
But it's worth noting that things like TLS versions, ciphers, certificate types, forward secrecy, and specific TLS features (e.g. OCSP staping and 0-RTT), are things that people who maintain websites generally might have to deal with directly (because they maintain their own webserver) or indirectly (because some CDNs offer some additional configuration options), so it seems like it would be useful to have at least an overview on the status of these things.
Agreed. Gonna be a tough call to cull down to a list of just 10 or so metrics!
Don't feel limited by the 10 metrics suggestion. If you think 25 is manageable, go for it! That said, I do think you want to dedupe similar metrics so your report is holistic and easy to read.
@arturjanc @ScottHelme we're hoping to finalize the metrics for each chapter today. Can you look through https://github.com/HTTPArchive/almanac.httparchive.org/issues/10#issue-446806026 and modify it to include anything we're missing? @ghedo made a bunch of suggestions in https://github.com/HTTPArchive/almanac.httparchive.org/issues/10#issuecomment-497769163 that should be merged if they LGTY.
@paulcalvano @bazzadp @ghedo as reviewers, please also give the list of metrics one last look and shout if you think anything should be changed.
Once the metrics are in a good place, please tick the last TODO checkbox and close this issue.
Other metrics that could be interesting (though I don't know how feasible it would be to collect them):
- Count of sites that support TLS 1.3 0-RTT
Will the data be able to provide this given the requirement for a second, resumed connection?
- Distribution of certificate types (ECDSA vs. RSA)
Similar to above, it'd have to connect twice with a preferred suite at the top for each key type to know if a host was exclusively using ECDSA/RSA or just one key type for auth. Also need to know the client advertised suites on the connection.
- Count of sites that only support legacy TLS versions (1.0, 1.1). This could be useful as input to https://datatracker.ietf.org/doc/draft-ietf-tls-oldversions-deprecate/
Agred, given the pending deprecation these would be worrying.
- Count of sites that only support legacy signature_algorithms when TLS 1.2 is negotiated (i.e. the algorithms using MD5 and SHA-1). This could be useful as input to https://datatracker.ietf.org/doc/draft-lvelvindron-tls-md5-sha1-deprecate/
Agreed.
- Count of sites that support modern ciphers (e.g. AES GCM, ChaCha20-Polyc1305)
- Count of sites that only support legacy ciphers (e.g. AES CBC, 3DES, RC4, ...)
- Count of sites that support forward secrecy (i.e. they support (EC)DHE)
Linked to above again, we'd need multiple connections to determine overall support or we could just use the suite connected with.
- Count of sites that support OCSP stapling
Yep.
I'd like to suggest the inclusion of more headers than CSP, HSTS and FP. As a minimum suggestion:
In addition to that some of the older 'x-based' headers:
Given how new they are I think it'd be intersting to see features around the new Reporting API and other security related monitoring mechanisms:
@pmeenan could you do a quick sanity check on the metrics suggested here that the Chrome profile is able to capture them? For example, I think I recall OCSP stapling detection only being available in Firefox agents. If there are any flags/configs that need to be turned on to get any of this info, we should identify those before the July crawl.
Count of sites that support TLS 1.3 0-RTT
Will the data be able to provide this given the requirement for a second, resumed connection?
Is having two requests on the page over the same TLS 1.3 connection sufficient? If so yes it's something we can measure. If it requires a second page view then no.
What about pages being marked insecure?
As mentioned above, I’d also like more non-HTTPS related stats. Here’s ones I can think of:
Oh and one more HTTPS stat:
Vulnerable JS libraries is thankfully available as a Lighthouse audit.
As for CSP, +1 to everything. I'm also curious about policy length which might be an indicator of indiscriminate generation by some plugin/tool versus smaller more hand-crafted policies. It could also be a symptom of having too many third parties, so splitting by that dimension could be interesting.
Can someone take a stab at coalescing all of the suggested metrics into the top comment?
I'm also curious about policy length which might be an indicator of indiscriminate generation by some plugin/tool versus smaller more hand-crafted policies.
I’d say the opposite - the longer the policy, the more likely it’s been custom generated. Case in point: https://securityheaders.com/?q=twitter.com&followRedirects=on. But yeah think it would be good to see some
metrics on length.
Can someone take a stab at coalescing all of the suggested metrics into the top comment?
Will leave @arturjanc and @ScottHelme to do that. Think I saw on Twitter that Scott is travelling at the mo.
@ScottHelme it was probably worded wrong, but what I meant with the "Count of sites that support modern/legacy ciphers" as well as certificate types and forward secrecy, was to check what the site negotiates by default, so we would just need a single connection, so no need to scan for the whole configuration (like SSL Labs does).
That is, given, say, a browser with a modern TLS configuration, but that nevertheless supports legacy algorithms, if the browser connection ends up using a legacy cipher suite then it means the site either _prefers_ that legacy configuration or simply doesn't support a modern one. So we can use that as indication of what normal web users would end up seeing with that particular site.
Also to be clear, the legacy/modern metrics would aggregate multiple negotiated ciphers into those two categories, so we wouldn't have separate metrics for each cipher, just "modern" vs. "legacy" depending on what is negotiated for the connection.
I'm a little late to the party but I wanted to add some more metrics and a few comments. Hopefully we can afterwards integrate all the ideas into a more coherent list as @bazzadp suggested above.
Some more security features, primarily focused on isolation (their use will be low this year):
Cross-Origin-Resource-PolicyCross-Origin-Opener-PolicyVary response header which contains Sec-Fetch (indicating server-side logic that checks Fetch Metadata request headers)SameSite attribute on cookies. I think Chrome has more fine-grained UKM data about this, but it could be useful as well.Trusted Types:
Content-Security-Policy header with a trusted-types directive.Flavors of Content Security Policy:
frame-ancestors. It may make sense to combine this with the reporting for X-Frame-Options which serves the same purpose; rather than reporting two different values for complementary mechanisms we could have a "this site protects itself from framing" metric that looks at both headers.'strict-dynamic'.'unsafe-inline'.'upgrade-insecure-requests'I would also propose to remove X-XSS-Protection -- for better or worse it's now a Chrome-specific not-fully-maintained feature and the value is being explicitly set to 0 by many major webapps due to cross-origin information leaks. It might be best to not promote its further adoption.
One caveat about this is that some of these protections only make sense for origins with sensitive authenticated content, so the coverage may be quite low. I think it's perfectly fine (e.g. domains without login don't really need XSS protections), so it would be nice to convey this somehow.
We need to finalize the metrics for this chapter ASAP. Could someone update https://github.com/HTTPArchive/almanac.httparchive.org/issues/10#issue-446806026 with the agreed list of metrics? If there are any iffy metrics, let's include them anyway with a note for the Data Analyst team.
Hate to nag, but we need to get this resolved today to stay on schedule. There's a lot of good discussion in the comments but the final metrics list needs to reflect what the consensus is.
@arturjanc @ScottHelme could you make the call?
🛎 ping to get this closed out as soon as possible, it's one week overdue
@arturjanc @ScottHelme do either of you have time today to update the list of metrics in the top comment with the consensus from the thread? Would love to close this issue today and unblock the analysis.
Here's my attempt to cover all of the metrics discussed so far that seem to be viable.
Looks pretty good to me! I say update first comment with those and then close out this issue at some point tomorrow if there's no more comments.
Few other thoughts from me:
unsafe-inline in script-src and style-src? Imagine the latter might be common for some CMS's to allow styled content and IMHO not quite as dangerous as allowing in script-src.upgrade-insecure-requests? Imagine this is a common one to help sites migrate to HTTPS even if they don't use any other, more difficult to implement, CSP features.Thanks a lot for synthesizing this, @ScottHelme, and apologies for not having time earlier this week! A couple of answers / notes:
CSP - should we differentiate between unsafe-inline in script-src and style-src?
tl;dr Yes. More broadly, determining the security of a policy is quite difficult because of CSP's inheritance logic and ignoring some keywords (e.g. 'unsafe-inline' is ignored when the policy has a nonce/hash, and a policy can be safe even without script-src, e.g. default-src 'none'). To get this right it might be helpful to use a library like the CSP Evaluator possibly with some tweaks to align it with what we need here.
CSP - measure use of upgrade-insecure-requests?
+1. This would also tie in nicely with your mixed content idea above.
Vary
I would look specifically for Vary: Sec-Fetch-Site or one of the other Sec-Fetch-* headers.
LGTM otherwise.
Updated with additional comments.
Agree with @arturjanc on the finer details around CSP and looking at it more closely, I'm not sure on how we'd integrate the evaluator. I will leave that up to greater minds to decide ;-)
LGTM. @ScottHelme can you edit @rviscomi's first comment in this issue (you should have edit permissions even though it's his comment) to replace the current metrics with this, and also tick the "Finalise metrics" checkbox in that comment and then Close the issue? Would do it myself but don't want to overstep my role as "reviewer" here ;-)
Updated the metrics in the top comment. Closing now. Thanks everyone!
Can the data tell us about the presence of a file? I'm thinking the security.txt file like this:
Unfortunately we're only aware of what is transferred over the network in the normal course of the page load. The exception is for Lighthouse audits that specifically check for these files, like the SEO audit for robots.txt. I don't think we will get it in time for this year's Almanac, but it might be a good idea to add a new Security audit to Lighthouse that checks this file.
@rviscomi As discussed offline, it might be interesting to track the usage of the WebAuthn API. A good read on general 2FA adoption across sites is Elie's blog post The bleak picture of two-factor authentication adoption in the wild.
Thanks @ndrnmnn! Do you know of the corresponding feature counter we could look at? For example, perhaps CredentialManagerGetPublicKeyCredential?
I've also sent you an invitation to be a reviewer of this chapter. Thanks again!
@rviscomi I checked some of the available metrics in chromestatus.com which seem to represent a funnel from credential creation to storage. We could therefore also include the CredentialManagerStore metric which in my understanding would represent the total number of credentials stored. WDYT?
The chromestatus feature counters are simple indicators that the API was used, not an aggregation of how they're used. And for that feature in particular, we don't have any pages in HTTP Archive that use it (the bottom chart is empty on that page)
Most helpful comment
Here's my attempt to cover all of the metrics discussed so far that seem to be viable.