Almanac.httparchive.org: Finalize assignments: Chapter 8. Security

Created on 21 May 2019 · 36Comments · Source: HTTPArchive/almanac.httparchive.org

Due date: To help us stay on schedule, please complete the action items in this issue by June 3.

To do:

[x] Assign subject matter experts (coauthors)
[x] Assign peer reviewers
[x] Finalize metrics

Current list of metrics:

TLS 🔒

Protocol Usage
- SSLv2 / SSLv3 / TLSv1.0 / TLSv1.1 / TLSv1.2 / TLSv1.3
Unique CA issuers
RSA certificates
ECDSA certificates
Certificate validation level (DV / OV / EV)
Cipher suite usage
- Suites supporting Forward Secrecy (ECDHE / DHE)
- Authenticated suites (GCM / CCM)
- Modern suites (AES GCM, ChaCha20-Polyc1305)
- Legacy suites (AES CBC, 3DES, RC4)
OCSP Stapling
Session ID/Ticket assignment
Sites redirecting to HTTPS
Sites with degraded HTTPS UI (mixed-content)

Security Headers 📋

Content Security Policy
- Policies with frame-ancestors
- Policies with 'nonce-'
- Policies with 'hash-'
- Policies with 'unsafe-inline'
- Policies with 'unsafe-eval'
- Policies with 'strict-dynamic'
- Policies with 'trusted-types'
- Policies with 'upgrade-insecure-requests'
HTTP Strict Transport Security
- Variance in max-age
- Use of includeSubDomains
- Use of preload token
Network Error Logging
Report To
Referrer Policy
Feature Policy
X-Content-Type-Options
X-Xss-Protection
X-Frame-Options
Cross-Origin-Resource-Policy
Cross-Origin-Opener-Policy
Vary (Sec-Fetch-* values)

Cookies 🍪

Use of HttpOnly
Use of Secure
Use of SameSite
Use of prefixes

Other ❓

Use of SRI on subresources
Vulnerable JS libraries (lighthouse?)

👉AI (coauthors): Assign peer reviewers. These are trusted experts who can support you when brainstorming metrics, interpreting results, and writing the report. Ideally this chapter will have 2 or more reviewers who can promote a diversity of perspectives.

👉 AI (coauthors): Finalize which metrics you might like to include in an annual "state of web security" report powered by HTTP Archive. Community contributors have initially sketched out a few ideas to get the ball rolling, but it's up to you, the subject matter experts, to know exactly which metrics we should be looking at. You can use the brainstorming doc to explore ideas.

The metrics should paint a holistic, data-driven picture of the web security landscape. The HTTP Archive does have its limitations and blind spots, so if there are metrics out of scope it's still good to identify them now during the brainstorming phase. We can make a note of them in the final report so readers understand why they're not discussed and the HTTP Archive team can make an effort to improve our telemetry for next year's Almanac.

Next steps: Over the next couple of months analysts will write the queries and generate the results, then hand everything off to you to write up your interpretation of the data.

Additional resources:

Source

rviscomi

Most helpful comment

Here's my attempt to cover all of the metrics discussed so far that seem to be viable.

TLS 🔒
- Protocol Usage
  - SSLv2 / SSLv3 / TLSv1.0 / TLSv1.1 / TLSv1.2 / TLSv1.3
- Unique CA issuers
- RSA certificates
- ECDSA certificates
- Certificate validation level (DV / OV / EV)
- Cipher suite usage
- Suites supporting Forward Secrecy (ECDHE / DHE)
- Authenticated suites (GCM / CCM)
- Modern suites (AES GCM, ChaCha20-Polyc1305)
- Legacy suites (AES CBC, 3DES, RC4)
- OCSP Stapling
- Session ID/Ticket assignment
- Sites redirecting to HTTPS
- Sites with degraded HTTPS UI (mixed-content)

Security Headers 📋
- Content Security Policy
- Policies with frame-ancestors
- Policies with 'nonce-'
- Policies with 'hash-'
- Policies with 'unsafe-inline'
- Policies with 'unsafe-eval'
- Policies with 'strict-dynamic'
- Policies with 'trusted-types'
- Policies with 'upgrade-insecure-requests'
- HTTP Strict Transport Security
- Variance in max-age
  - Use of includeSubDomains
  - Use of preload token
- Network Error Logging
- Report To
- Referrer Policy
- Feature Policy
- X-Content-Type-Options
- X-Xss-Protection
- X-Frame-Options
- Cross-Origin-Resource-Policy
- Cross-Origin-Opener-Policy
- Vary (Sec-Fetch-* values)

Cookies 🍪
- Use of HttpOnly
- Use of Secure
- Use of SameSite
- Use of prefixes

Other ❓
- Use of SRI on subresources
- Vulnerable JS libraries (lighthouse?)

ScottHelme on 13 Jun 2019

❤3

All 36 comments

Happy to help review this section if you want, as looks like you are light on reviewers?

bazzadp on 31 May 2019

🎉1

That'd be great, thanks @bazzadp!

rviscomi on 31 May 2019

Other metrics that could be interesting (though I don't know how feasible it would be to collect them):

Count of sites that support TLS 1.3 0-RTT
Distribution of certificate types (ECDSA vs. RSA)
Count of sites that only support legacy TLS versions (1.0, 1.1). This could be useful as input to https://datatracker.ietf.org/doc/draft-ietf-tls-oldversions-deprecate/
Count of sites that only support legacy signature_algorithms when TLS 1.2 is negotiated (i.e. the algorithms using MD5 and SHA-1). This could be useful as input to https://datatracker.ietf.org/doc/draft-lvelvindron-tls-md5-sha1-deprecate/
Count of sites that support modern ciphers (e.g. AES GCM, ChaCha20-Polyc1305)
Count of sites that only support legacy ciphers (e.g. AES CBC, 3DES, RC4, ...)
Count of sites that support OCSP stapling
Count of sites that support forward secrecy (i.e. they support (EC)DHE)

I'm also available if more reviewers are needed.

ghedo on 31 May 2019

👍2

Thanks @ghedo! I've added you as a reviewer and sent you a team invite.

rviscomi on 1 Jun 2019

👍1

Those are all great stats to know @ghedo but I would caution the authors to be careful to not make it just about SSL/TLS. That’s very important obviously but we all know there is more to security than just that. Other resources (e.g. https://www.ssllabs.com/ssl-pulse/) measure nitty gritty details of SSL/TLS usage well for those of us really interested in the deep down detail of this.

In another post (https://github.com/HTTPArchive/almanac.httparchive.org/issues/1) @rviscomi suggested about 10 metrics per chapter and I think if more than 3-4 of those for this chapter were about SSL/TLS in then we could be at risk of concentrating on that too much and missing out on other interesting analysis.

IMHO we need to think about finding the stats that will be the most useful to the wider community and not just the security community for this report. And that might mean picking stats that represent the rough state of security (e.g. TLS version) rather than something more specific (specific cipher suites). Finding this balance of representative detail versus too much detail is something I’m also struggling with for my chapter on HTTP/2 (https://github.com/HTTPArchive/almanac.httparchive.org/issues/22) btw.

Anyway just my two cents, and don’t want to put people off suggesting metrics (it’s easier to whittle down a big list than to stretch up a little list!) but something I’m giving a bit of thought to for my chapter so thought I’d mention here too for consideration.

bazzadp on 1 Jun 2019

Oh and other point (and counter argument to my points above!) is some stats will probably throw out a few surprises, so we should also be careful not to limit ourselves too much either on stats because of preconceived ideas of what they will show. Can always exclude stats in final report if doesn’t show anything interesting.

bazzadp on 1 Jun 2019

@bazzadp I agree with you, and indeed I tried to list more general metrics (e.g. "sites that support modern/legacy ciphers"), rather than more specific ones (e.g. "distribution of specific ciphers"), though there's probably margin for improvement. Happy to discuss this more to try and get the list more focused (it's also likely that some of the metrics I proposed can't be easily collected anyway).

But it's worth noting that things like TLS versions, ciphers, certificate types, forward secrecy, and specific TLS features (e.g. OCSP staping and 0-RTT), are things that people who maintain websites generally might have to deal with directly (because they maintain their own webserver) or indirectly (because some CDNs offer some additional configuration options), so it seems like it would be useful to have at least an overview on the status of these things.

ghedo on 1 Jun 2019

👍2

Agreed. Gonna be a tough call to cull down to a list of just 10 or so metrics!

bazzadp on 1 Jun 2019

Don't feel limited by the 10 metrics suggestion. If you think 25 is manageable, go for it! That said, I do think you want to dedupe similar metrics so your report is holistic and easy to read.

rviscomi on 1 Jun 2019

👍1

@arturjanc @ScottHelme we're hoping to finalize the metrics for each chapter today. Can you look through https://github.com/HTTPArchive/almanac.httparchive.org/issues/10#issue-446806026 and modify it to include anything we're missing? @ghedo made a bunch of suggestions in https://github.com/HTTPArchive/almanac.httparchive.org/issues/10#issuecomment-497769163 that should be merged if they LGTY.

@paulcalvano @bazzadp @ghedo as reviewers, please also give the list of metrics one last look and shout if you think anything should be changed.

Once the metrics are in a good place, please tick the last TODO checkbox and close this issue.

rviscomi on 4 Jun 2019

Other metrics that could be interesting (though I don't know how feasible it would be to collect them):

Count of sites that support TLS 1.3 0-RTT

Will the data be able to provide this given the requirement for a second, resumed connection?

Distribution of certificate types (ECDSA vs. RSA)

Similar to above, it'd have to connect twice with a preferred suite at the top for each key type to know if a host was exclusively using ECDSA/RSA or just one key type for auth. Also need to know the client advertised suites on the connection.

Count of sites that only support legacy TLS versions (1.0, 1.1). This could be useful as input to https://datatracker.ietf.org/doc/draft-ietf-tls-oldversions-deprecate/

Agred, given the pending deprecation these would be worrying.

Count of sites that only support legacy signature_algorithms when TLS 1.2 is negotiated (i.e. the algorithms using MD5 and SHA-1). This could be useful as input to https://datatracker.ietf.org/doc/draft-lvelvindron-tls-md5-sha1-deprecate/

Agreed.

Count of sites that support modern ciphers (e.g. AES GCM, ChaCha20-Polyc1305)

Count of sites that only support legacy ciphers (e.g. AES CBC, 3DES, RC4, ...)

Count of sites that support forward secrecy (i.e. they support (EC)DHE)

Linked to above again, we'd need multiple connections to determine overall support or we could just use the suite connected with.

Count of sites that support OCSP stapling

Yep.

ScottHelme on 4 Jun 2019

👍2

I'd like to suggest the inclusion of more headers than CSP, HSTS and FP. As a minimum suggestion:

Referrer Policy

In addition to that some of the older 'x-based' headers:

X-Content-Type-Options
X-Xss-Protection
X-Frame-Options

Given how new they are I think it'd be intersting to see features around the new Reporting API and other security related monitoring mechanisms:

Report-To
NEL (Network Error Logging)
Expect-CT

ScottHelme on 4 Jun 2019

👍2

@pmeenan could you do a quick sanity check on the metrics suggested here that the Chrome profile is able to capture them? For example, I think I recall OCSP stapling detection only being available in Firefox agents. If there are any flags/configs that need to be turned on to get any of this info, we should identify those before the July crawl.

Count of sites that support TLS 1.3 0-RTT
Will the data be able to provide this given the requirement for a second, resumed connection?

Is having two requests on the page over the same TLS 1.3 connection sufficient? If so yes it's something we can measure. If it requires a second page view then no.

rviscomi on 4 Jun 2019

What about pages being marked insecure?

HTTP (not HTTPS) pages with Credit Card or Password fields.
HTTP (not HTTPS) pages with any input fields.
HTTPS pages with mixed content.

As mentioned above, I’d also like more non-HTTPS related stats. Here’s ones I can think of:

Amount of 3rd party content
Ads or Trackers per page (more privacy than security but can cause security issues and since we don’t have a privacy chapter...)
A measure of Cookies and what security options they use (HttpOnly, Secure, SameSite... etc.).
CSP is now mainstream so think we need more than just a measure of whether header exists. Some analysis to try to see if it’s a useful policy (e.g. no unsafe-inline for script-src at least?). Maybe measure upgrade-insecure-request type policies separately (a common one I suspect, that’s still useful but not really a CSP as such if still allows everything and only being used to migrate to HTTPS).
Average (or Total?) number of CSP alerts per page?
I like the suggest stat about vulnerable libraries. It sure how to measure but something like this would be good. Limit to jQuery as an example? Or jQuery, Bootstrap, Angular, AngularJS, React and Vue? Or all libraries somehow?
SRI usage? Though personally I think it’s a bit pointless and better to self host (https://mobile.twitter.com/tunetheweb/status/1134559858353745923).

Oh and one more HTTPS stat:

Base domain’s (e.g. example.com) where certificate doesn’t cover www variant (e.g. www.example.com) or vice versa.

bazzadp on 4 Jun 2019

❤1 👍1

Vulnerable JS libraries is thankfully available as a Lighthouse audit.

As for CSP, +1 to everything. I'm also curious about policy length which might be an indicator of indiscriminate generation by some plugin/tool versus smaller more hand-crafted policies. It could also be a symptom of having too many third parties, so splitting by that dimension could be interesting.

rviscomi on 4 Jun 2019

👍1

Can someone take a stab at coalescing all of the suggested metrics into the top comment?

rviscomi on 4 Jun 2019

I'm also curious about policy length which might be an indicator of indiscriminate generation by some plugin/tool versus smaller more hand-crafted policies.

I’d say the opposite - the longer the policy, the more likely it’s been custom generated. Case in point: https://securityheaders.com/?q=twitter.com&followRedirects=on. But yeah think it would be good to see some
metrics on length.

Can someone take a stab at coalescing all of the suggested metrics into the top comment?

Will leave @arturjanc and @ScottHelme to do that. Think I saw on Twitter that Scott is travelling at the mo.

bazzadp on 4 Jun 2019

👍1

@ScottHelme it was probably worded wrong, but what I meant with the "Count of sites that support modern/legacy ciphers" as well as certificate types and forward secrecy, was to check what the site negotiates by default, so we would just need a single connection, so no need to scan for the whole configuration (like SSL Labs does).

That is, given, say, a browser with a modern TLS configuration, but that nevertheless supports legacy algorithms, if the browser connection ends up using a legacy cipher suite then it means the site either _prefers_ that legacy configuration or simply doesn't support a modern one. So we can use that as indication of what normal web users would end up seeing with that particular site.

Also to be clear, the legacy/modern metrics would aggregate multiple negotiated ciphers into those two categories, so we wouldn't have separate metrics for each cipher, just "modern" vs. "legacy" depending on what is negotiated for the connection.

ghedo on 4 Jun 2019

👍2

I'm a little late to the party but I wanted to add some more metrics and a few comments. Hopefully we can afterwards integrate all the ideas into a more coherent list as @bazzadp suggested above.

Some more security features, primarily focused on isolation (their use will be low this year):

Cross-Origin-Resource-Policy
Cross-Origin-Opener-Policy
The Vary response header which contains Sec-Fetch (indicating server-side logic that checks Fetch Metadata request headers)
The use of the SameSite attribute on cookies. I think Chrome has more fine-grained UKM data about this, but it could be useful as well.

Trusted Types:

The Content-Security-Policy header with a trusted-types directive.

Flavors of Content Security Policy:

CSPs which prevent framing, i.e. include frame-ancestors. It may make sense to combine this with the reporting for X-Frame-Options which serves the same purpose; rather than reporting two different values for complementary mechanisms we could have a "this site protects itself from framing" metric that looks at both headers.
Policies which use CSP2 nonces/hashes and CSP3 'strict-dynamic'.
Policies which try to protect against XSS and don't have 'unsafe-inline'.
Policies with 'upgrade-insecure-requests'

I would also propose to remove X-XSS-Protection -- for better or worse it's now a Chrome-specific not-fully-maintained feature and the value is being explicitly set to 0 by many major webapps due to cross-origin information leaks. It might be best to not promote its further adoption.

One caveat about this is that some of these protections only make sense for origins with sensitive authenticated content, so the coverage may be quite low. I think it's perfectly fine (e.g. domains without login don't really need XSS protections), so it would be nice to convey this somehow.

arturjanc on 4 Jun 2019

❤1

Certificate Transparency compliance might be misleading to report on. At it's current state, the % of complaint certificates is largely based on how new certificates are. This might be a more meaningful metric to track for next years almanac.
I don't think we can identify 0-RTT support in this data.
cipher strength would be interesting to track, especially by 3rd parties.
SubResourceIntegrity usage would be interesting

paulcalvano on 4 Jun 2019

👍1

We need to finalize the metrics for this chapter ASAP. Could someone update https://github.com/HTTPArchive/almanac.httparchive.org/issues/10#issue-446806026 with the agreed list of metrics? If there are any iffy metrics, let's include them anyway with a note for the Data Analyst team.

rviscomi on 6 Jun 2019

Hate to nag, but we need to get this resolved today to stay on schedule. There's a lot of good discussion in the comments but the final metrics list needs to reflect what the consensus is.

@arturjanc @ScottHelme could you make the call?

rviscomi on 7 Jun 2019

🛎 ping to get this closed out as soon as possible, it's one week overdue

rviscomi on 10 Jun 2019

@arturjanc @ScottHelme do either of you have time today to update the list of metrics in the top comment with the consensus from the thread? Would love to close this issue today and unblock the analysis.

rviscomi on 11 Jun 2019

Here's my attempt to cover all of the metrics discussed so far that seem to be viable.

TLS 🔒
- Protocol Usage
  - SSLv2 / SSLv3 / TLSv1.0 / TLSv1.1 / TLSv1.2 / TLSv1.3
- Unique CA issuers
- RSA certificates
- ECDSA certificates
- Certificate validation level (DV / OV / EV)
- Cipher suite usage
- Suites supporting Forward Secrecy (ECDHE / DHE)
- Authenticated suites (GCM / CCM)
- Modern suites (AES GCM, ChaCha20-Polyc1305)
- Legacy suites (AES CBC, 3DES, RC4)
- OCSP Stapling
- Session ID/Ticket assignment
- Sites redirecting to HTTPS
- Sites with degraded HTTPS UI (mixed-content)

Security Headers 📋
- Content Security Policy
- Policies with frame-ancestors
- Policies with 'nonce-'
- Policies with 'hash-'
- Policies with 'unsafe-inline'
- Policies with 'unsafe-eval'
- Policies with 'strict-dynamic'
- Policies with 'trusted-types'
- Policies with 'upgrade-insecure-requests'
- HTTP Strict Transport Security
- Variance in max-age
  - Use of includeSubDomains
  - Use of preload token
- Network Error Logging
- Report To
- Referrer Policy
- Feature Policy
- X-Content-Type-Options
- X-Xss-Protection
- X-Frame-Options
- Cross-Origin-Resource-Policy
- Cross-Origin-Opener-Policy
- Vary (Sec-Fetch-* values)

Cookies 🍪
- Use of HttpOnly
- Use of Secure
- Use of SameSite
- Use of prefixes

Other ❓
- Use of SRI on subresources
- Vulnerable JS libraries (lighthouse?)

ScottHelme on 13 Jun 2019

❤3

Looks pretty good to me! I say update first comment with those and then close out this issue at some point tomorrow if there's no more comments.

Few other thoughts from me:

TLS: What about TLS warnings and errors (e.g. HTTPS sites with mixed content, HTTP sites with Credit Card or password fields or any forms, HTTPS sites that are blocked due to inadequate TLS or being on Safe Browser list)?
TLS: Sites with EV? Know it's not liked by many in security industry but may be interesting to see. Especially if it declines over the years.
CSP - should we differentiate between unsafe-inline in script-src and style-src? Imagine the latter might be common for some CMS's to allow styled content and IMHO not quite as dangerous as allowing in script-src.
CSP - measure use of upgrade-insecure-requests? Imagine this is a common one to help sites migrate to HTTPS even if they don't use any other, more difficult to implement, CSP features.
Other - Don't forget "Vulnerable JavaScript libraries". Think this is an important one and apparently easily enough measured from above discussion.

bazzadp on 13 Jun 2019

Thanks a lot for synthesizing this, @ScottHelme, and apologies for not having time earlier this week! A couple of answers / notes:

CSP - should we differentiate between unsafe-inline in script-src and style-src?

tl;dr Yes. More broadly, determining the security of a policy is quite difficult because of CSP's inheritance logic and ignoring some keywords (e.g. 'unsafe-inline' is ignored when the policy has a nonce/hash, and a policy can be safe even without script-src, e.g. default-src 'none'). To get this right it might be helpful to use a library like the CSP Evaluator possibly with some tweaks to align it with what we need here.

CSP - measure use of upgrade-insecure-requests?

+1. This would also tie in nicely with your mixed content idea above.

Vary

I would look specifically for Vary: Sec-Fetch-Site or one of the other Sec-Fetch-* headers.

LGTM otherwise.

arturjanc on 14 Jun 2019

Updated with additional comments.

Agree with @arturjanc on the finer details around CSP and looking at it more closely, I'm not sure on how we'd integrate the evaluator. I will leave that up to greater minds to decide ;-)

ScottHelme on 14 Jun 2019

LGTM. @ScottHelme can you edit @rviscomi's first comment in this issue (you should have edit permissions even though it's his comment) to replace the current metrics with this, and also tick the "Finalise metrics" checkbox in that comment and then Close the issue? Would do it myself but don't want to overstep my role as "reviewer" here ;-)

bazzadp on 14 Jun 2019

❤1

Updated the metrics in the top comment. Closing now. Thanks everyone!

rviscomi on 17 Jun 2019

Can the data tell us about the presence of a file? I'm thinking the security.txt file like this:

https://scotthelme.co.uk/.well-known/security.txt

ScottHelme on 20 Jun 2019

Unfortunately we're only aware of what is transferred over the network in the normal course of the page load. The exception is for Lighthouse audits that specifically check for these files, like the SEO audit for robots.txt. I don't think we will get it in time for this year's Almanac, but it might be a good idea to add a new Security audit to Lighthouse that checks this file.

rviscomi on 20 Jun 2019

@rviscomi As discussed offline, it might be interesting to track the usage of the WebAuthn API. A good read on general 2FA adoption across sites is Elie's blog post The bleak picture of two-factor authentication adoption in the wild.

ndrnmnn on 24 Jun 2019

👍1

Thanks @ndrnmnn! Do you know of the corresponding feature counter we could look at? For example, perhaps CredentialManagerGetPublicKeyCredential?

I've also sent you an invitation to be a reviewer of this chapter. Thanks again!

rviscomi on 24 Jun 2019

@rviscomi I checked some of the available metrics in chromestatus.com which seem to represent a funnel from credential creation to storage. We could therefore also include the CredentialManagerStore metric which in my understanding would represent the total number of credentials stored. WDYT?

ndrnmnn on 25 Jun 2019

The chromestatus feature counters are simple indicators that the API was used, not an aggregation of how they're used. And for that feature in particular, we don't have any pages in HTTP Archive that use it (the bottom chart is empty on that page)

rviscomi on 25 Jun 2019

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Typo in JS featured snippet

rviscomi · 3Comments

Write content: Chapter 1. JavaScript

rviscomi · 3Comments

Create templates for content pages

rviscomi · 5Comments

Join the 2020 Editors team

rviscomi · 3Comments

Consider users' Accept-Language preference when selecting the default language

MSakamaki · 6Comments