| Authors | Reviewers | Analysts | Draft | Queries | Results |
| ------- | --------- | -------- | ----- | ------- | ------- |
| @nrllh @tomvangoethem | @cqueern @bazzadp @edmondwwchan | @AAgar @tomvangoethem | Doc | *.sql | Sheet |
Content team lead: @nrllh
Welcome chapter contributors! You'll be using this issue throughout the chapter lifecycle to coordinate on the content planning, analysis, and writing stages.
The content team is made up of the following contributors:
New contributors: If you're interested in joining the content team for this chapter, just leave a comment below and the content team lead will loop you in.
_Note: To ensure that you get notifications when tagged, you must be "watching" this repository._
I'd like to volunteer as an analyst. I've used HTTPArchive in some of my (academic) research, so I have some familiarity with the datasets.
@rviscomi as spoken recently, I would also like to join in this chapter.
@tomvangoethem added you as an analyst :)
Hello Team. I would like to participate as a Reviewer please. :grinning:
@nrllh thank you for agreeing to be the lead author for the Security chapter! As the lead, you'll be responsible for driving the content planning and writing phases in collaboration with your content team, which will consist of yourself as lead, any coauthors you choose as needed, peer reviewers, and data analysts.
The immediate next steps for this chapter are:
There's a ton of info in the top comment, so check that out and feel free to ping myself or @obto with any questions!
I'm happy to review this chapter again this year btw. Added myself to first comment.
@ivanr @april would you have any interesting in helping out in this chapter this year? Last year's chapter for context: https://almanac.httparchive.org/en/2019/security
@tomvangoethem, assigned you also as author ;)
Hey @nrllh, just checking in:
@tomvangoethem @cqueern @bazzadp can you please request edit access and then credit yourself in Google Doc?
Hello team, may I contribute as a reviewer too?
@edmondwwchan welcome to the club!
Please request an edit access and credit yourself in the doc.
@nrllh How's the chapter outline coming along? We want to have that wrapped up by the end of the week so we have time to set up our Web Crawler :)
As discussed on Slack I think we should ask for custom metrics for vulnCount, Library (including version), Library (excluding version) and highestSeverity from the no-vulnerable-libraries lighthouse metric:
"no-vulnerable-libraries": {
"description": "Some third-party scripts may contain known security vulnerabilities that are easily identified and exploited by attackers. [Learn more](https://developers.google.com/web/tools/lighthouse/audits/vulnerabilities).",
"title": "Includes front-end JavaScript libraries with known security vulnerabilities",
"score": 0,
"details": {
"items": [
{
"vulnCount": 4,
"detectedLib": {
"url": "https://snyk.io/vuln/npm:jquery?lh=1.4.4&utm_source=lighthouse&utm_medium=ref&utm_campaign=audit",
"text": "[email protected]",
"type": "link"
},
"highestSeverity": "Medium"
}
],
"type": "table",
"headings": [
{
"text": "Library Version",
"itemType": "link",
"key": "detectedLib"
},
{
"text": "Vulnerability Count",
"itemType": "text",
"key": "vulnCount"
},
{
"text": "Highest Severity",
"itemType": "text",
"key": "highestSeverity"
}
],
"summary": []
},
"scoreDisplayMode": "binary",
"displayValue": "4 vulnerabilities detected",
"id": "no-vulnerable-libraries"
},
Related to https://github.com/HTTPArchive/almanac.httparchive.org/blob/main/sql/2019/08_Security/08_40.sql and https://github.com/HTTPArchive/almanac.httparchive.org/blob/main/sql/2019/08_Security/08_40b.sql but want MOAR detail which is more easily queryable
And also password-inputs-can-be-pasted-into score from Lighthouse, but maybe less relevant from Home Screen which HTTP Archive is restricted to. Anyway to only include when site has a login login form field?
"password-inputs-can-be-pasted-into": {
"description": "Preventing password pasting undermines good security policy. [Learn more](https://developers.google.com/web/tools/lighthouse/audits/password-pasting).",
"title": "Allows users to paste into password fields",
"score": 1,
"details": {
"items": [],
"type": "table",
"headings": []
},
"scoreDisplayMode": "binary",
"id": "password-inputs-can-be-pasted-into"
}
And external-anchors-use-rel-noopener score and count of items also from Lighthouse
"external-anchors-use-rel-noopener": {
"description": "Add `rel=\"noopener\"` or `rel=\"noreferrer\"` to any external links to improve performance and prevent security vulnerabilities. [Learn more](https://developers.google.com/web/tools/lighthouse/audits/noopener).",
"warnings": [],
"title": "Links to cross-origin destinations are safe",
"score": 1,
"details": {
"items": [],
"type": "table",
"headings": []
},
"scoreDisplayMode": "binary",
"id": "external-anchors-use-rel-noopener"
}
@pmeenan not sure if you saw this Slack conversation but is it possible to configure the run to crawl additional meta-data URLs like /.well-known/change-password and '/security.txt'
Probably not as part of the crawl since those aren't actually loaded by the page itself. A custom metric might be able to pull them with fetch since they are same-origin but it kind of feels like something that a separate curl script run once against the URL list might be better for.
I can volunteer as an additional analyst if you need it
@aagar you are welcome!
TLS π
Security Headers π
Cookies πͺ
WebAssembly π
Lighthouse π‘
Cross-Site-Request-Forgery ποΈ
Information Leakage βΉοΈ
Other β
@tomvangoethem @cqueern @bazzadp @edmondwwchan @AAgar
I just migrated the metrics from last year's Almanac and added some new points. Please review it and make some recommendations ^^
@nrllh looks pretty great. This is exciting. I am for sure interested in SRI on subresources so look forward to seeing how we can address it.
SRI is so over-rated and pointless IMHO. If you can self-host you should do. If you can't because it changes frequently, then can't use SRI. So what's the point? Anyway I digress...
@bazzadp - I think it's worth including SRI usage in chapter and if usage are low, may be the reasons are as stated by you in comment above and we should add as a food for thought in chapter (may be)
Oh not saying don't include it, just interjecting personal opinion to the conversation π As I said a digression. Back to chapter planning!
@nrllh your list looks pretty good. Particularly interested in the following metrics but uncertain if it's easy/possible to measure from the HTTP archive dataset:
@nrllh @tomvangoethem - Any thoughts on including Bot detection solutions (e.g. PerimeterX / CyberFend) in scope of this chapter? We can do a quick PR in Wappalyzer for top vendors. I raised a feature request to add 'Security' as category in Wappalyzer - https://github.com/AliasIO/wappalyzer/issues/3226
@edmondwwchan I'm also not sure if we can measure them, there are now some limitations for table _requests_, I also couldn't check it, but I add these to the list.
@rockeynebhwani It's a good idea, but I don't know if we have enough data delivered by Wappalyzer. Following query delivers five rows and the distribution doesn't look so relevant:
SELECT app, count(app) as count FROM httparchive.technologies.2020_06_01_desktop where category='Captchas' group by app

Even if Wappalyzer recognizes in the next weeks (or months) more products for this category, I don't know if the HTTPArchive crawler will have the chance to support it (@bazzadp?)
Few other thoughts:
@nrllh - as I understand from @rviscomi that if we are able to work on the issue I raised on Wappalyzer Github and submit a PR next week, HTTPArchive crawler for 1st Aug will be able to provide us this insight.
@nrllh - WebAuthn adoption will be interesting one. I remember reading eBay implementing it (https://tech.ebayinc.com/product/ebay-makes-mobile-web-login-easier/) and support on Safari web is coming in iOS14, so adoption should increase in coming days.
I don't know how to detect this. I can see webAuthn references on eBay (https://www.ebay.com/signin/)

@senthilp any idea?
@rockeynebhwani there are two main calls for WebAuthn (navigator.credentials.createand navigator.credentials.get), I could totally find 2442 URLs that contain javascript files with these calls. But of course, this doesn't mean all these websites provide this functionality for their users.
Ideally we shouldn't be scanning the response bodies for patterns. It's expensive and flaky. In this case I wonder if there are existing Blink feature counters that we can query. For example maybe CredentialManagerGetReturnedCredential?
@rviscomi it seems there are some security-related feature counters, but it's hard to get the context of these features, they really are not well documented?
@tomvangoethem @cqueern @bazzadp @edmondwwchan @AAgar
I updated the list of metrics. Let's try to get the final version for that please by Tuesday. The core team needs some time to configure the crawler. I also introduced sections for outline (based on our metrics) in our Doc. Please feel free to edit it.
@nrllh @rockeynebhwani The problem with looking at tech like Captcha's is many of these are run on Contact Us forms and the like. Many of which are not found on a homepage, but instead on a subpage like /contact-us/.
Since we only look at the homepage of every site we crawl, the data we'd gather and report could be wildly different than what the real usage numbers are.
@obto - Agree. Very few sites deploy bot protection site wide. Only question we will be able to answer what % of sites have bot protection deployed on HomePage.
Or, wishful thinking for future if .well-known/change-password standard picks up,
An example WPT where I tried to fetch www.apple.com/.well-known/change-password using WPT custom script as apple has adopted this - https://www.webpagetest.org/result/200719_DR_36cb98ba5bc7d2b3dfba19511bdf26b3/1/details/#step1_request20
A site is bound to have bot protection on such pages but we will still miss sites where there is no login function
@nrllh comments on your list
Cookies - think we can do a bit more here, especially as Cookie chapter was closed. How many cookies are set? What size are they? How many are 1st party, how many 3rd party? How many 3rd party cookies does the average site set? What are the difference between the attributes for 3rd party versus 1st party? I'd imagine most 3rd party analytics don't set Secure flag for example. Also we'll need to clarify this section that it's run from US servers. That applies to whole chapter (and whole Web Almanac) but I think especially for cookies given some regions (like EU) have much stricter rules on setting cookies without consent and sites are starting to respect those laws.
Crypto-miner - like this, but why under WASM category? It's happening under JavaScript too.
Information Leakage - presume you're talking about the Server and X-Powered-By type headers? Do we have a list of them? I have:
ServerX-Powered-ByX-AspNetMvcVersionX-AspNetVersionAlso, as mentioned previously, that's a long list. Think it's good to run it all, especially as most are already available from last year's queries, but personally I would like to be more selective as to what we actually publish to avoid just being a listing of settings and instead all you to give more commentary on a shorter list. Think there's more value in that.
@AAgar @tomvangoethem we should look to convert the expensive and unreliable body scanning metrics to custom metrics. The security ones are here and the media analysts have opened a pull request for their custom metrics which will serve as a good template of what to do. This needs to be in place by August 1st so can one or both of you work on it this week? Once you've got used to how to convert the existing ones, that should stand you in good stead to look at the new metrics and see if they need any of the same.
@bazzadp
Cookies...
I'll update the list.
Crypto-miner - like this, but why under WASM category? It's happening under JavaScript too.
WASM chapter is closed and I think, it's better to have it as a category here, we want to also share some stats about WASM's usage (vs. usage by crypto-miner).
Information Leakage
We have them in the table technologies as:
We may also include the categories CMS, Blog etc...
Think there's more value in that.
π
For cryptominers, I assume that we will base this on URL patterns, i.e. only consider known cryptominers? Are there any metrics on CPU usage when the site is visited? Could be useful to figure out if the cryptominer was started from the start.
I would suggest to change the "Information Leakage" section to "Outdatedness". Showing which version of web application you're running is not really leaking that much information (e.g. compared to having a private key in an HTML comment). Perhaps we could also run a query on the latter by running a regex on all response bodies; there are some useful regexes in the gitleaks repo
FYI we have access to the cryptominers technology detection from Wappalyzer.
I tried to add a bit more structure to the outline to make things more generic; I think it makes it easier to reason about things. Let me know what you think; should I add it to the doc?
Secure attribute on cookies__Secure- prefix on cookies__Host- prefixSecure cookie?*-src directives)<iframe> sandboxframe-ancestors, trusted types, ...)For custom metrics, maybe we also want to extract all <meta http-equiv="..."> elements? From the ones that @bazzadp linked, only the integrity attribute is needed
@agektmr had some ideas for detecting WebAuthn via feature counters. He suggests using CredentialManagerGetPublicKeyCredential. Thank you Eiji!
@tomvangoethem this looks like a great list! I think it's a good idea to add it to the doc to make it easier to comment/iterate on specific parts.
@rviscomi is there such thing as too much content? Or is that a question for later in the process?
Good question @cqueern. I think it's ok to have a lot of content, as long as the content team has the bandwidth to support writing/reviewing/analyzing it. For reference, here's how the chapters looked last year in terms of length:

Like the @tomvangoethem's idea on how we can organize the materials. From the 2019's introduction we have the goals for this chapter:
From my point of view, "Drivers of security mechanism adoption" is more like a methodology to explain why a security feature is adopted. Perhaps the measurement results can be useful to conclude this chapter and to give some pointers in subsections to explain some of the observations.
Also, I would suggest adding the discussion of "Bad security practices on the web" as the goal for this year.
@tomvangoethem well done. I added it for you to doc, with some additional points and comments.
cc @cqueern @bazzadp @edmondwwchan @AAgar
@edmondwwchan As for bad security practices, a few of the metrics might have overlap with current status of security on the web and available features, e.g. RSA vs ECDSA could be considered better security but it'd fall under current status (percentage type result), yet we could also classify using primarily RSA as a bad security practice.
@AAgar we may want to explicitly highlight our goals in the introduction section including the discussions of the current status as well as the bad practices, and both of which should be driven by and sharing the metrics and measurements we settled for this year.
Btw, IMHO I would say RSA is still doing its job and perhaps it's hard to say it's a kind of bad security practice.
I'd agree with that. RSA keys are (as far as I understand) equally secure as ECDSA as long as (the standard) larger key sizes are used. ECDSA (and more importantly serving both) is still not universally supported and is complicated to set up.
RSA versus ECDHE for key exchange mechanism is different however if you meant that? Chrome already flags such sites as using "obsolete connection settings" - though currently only in the Security tab in developer tools rather than in anything more obvious to the casual user. Still, lack of ECDHE in this day and age suggests running older versions of software and/or badly configured software so is a big warning sign IMHO.
@AAgar could you request edit access to the doc?
@obto - Agree. Very few sites deploy bot protection site wide. Only question we will be able to answer what % of sites have bot protection deployed on HomePage.
Or, wishful thinking for future if .well-known/change-password standard picks up,
- We can fetch www.example.com/.well-known/changed-password using WPT custom scripting
- It should typically be a 301 re-direct and if this exists in waterfall, we can try to grab cookies from resultant page.
An example WPT where I tried to fetch www.apple.com/.well-known/change-password using WPT custom script as apple has adopted this - https://www.webpagetest.org/result/200719_DR_36cb98ba5bc7d2b3dfba19511bdf26b3/1/details/#step1_request20
A site is bound to have bot protection on such pages but we will still miss sites where there is no login function
I tried to come up with a custom metric for ./well-known/change-password
[well-known-change-password]
return fetch('/.well-known/change-password').then(function(r) {
if(r.status === 200) {
return "true";
}else{
return "false";
};
});
|Site|Scenario|WebPageTest Link
------------ | ------------- | -------------
|https://www.github.com|Has well-known/change-password file and custom metric works|https://www.webpagetest.org/custom_metrics.php?test=200726_JB_c433b392b7dfdce819ccde4a1cbf2ff2&run=1&cached=0
|https://www.twitter.com|Has well-known/change-password file and custom metric works|https://www.webpagetest.org/custom_metrics.php?test=200726_2S_395bb8a4569979f7e78180a848c52af4&run=1&cached=0
|https://www.burberry.com|Doesnt' have well-known/change-password file but mistakenly does 200 re-direct and custom metric gives false postive due to this|https://www.webpagetest.org/custom_metrics.php?test=200726_Y4_d15add83e44ae5c369fed9c5fd1ffde8&run=1&cached=0
|https://www.apple.com|Has well-known/change-password file but blocks due to CORS policy. Error (when you try in console) - www.apple.com/:1 Access to fetch at 'https://appleid.apple.com/account/manage' (redirected from 'https://www.apple.com/.well-known/change-password') from origin 'https://www.apple.com' has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present on the requested resource. If an opaque response serves your needs, set the request's mode to 'no-cors' to fetch the resource with CORS disabled.|https://www.webpagetest.org/custom_metrics.php?test=200726_6W_1526d8f44c161ef779acdba4d4324020&run=1&cached=0
As per ./well-known/change-password specs, site should publish 'Location' header which I am able to see on GitHub/Twitter/Apple. I thought I can check this to avoid Burberry's false positive and I tried to check reponse header using following in console
fetch('/.well-known/change-password').then(res=>{
for(const header of res.headers){
console.log(`Name: ${header[0]}, Value:${header[1]}`);
}
});
but this doesnt' give me headers for intermediate requests, it only gives headers for final re-directed request and that doesn't have 'Location' header and I got stuck after reading this - https://github.com/whatwg/fetch/issues/763 :-).. So before, I forget what all did, I thought I will write this comment in case anybody has any other thoughts.
@bazzadp / @rviscomi / @obto
Can't you fail the custom metric for any request that redirects?
@rviscomi - Actually this one is different from presence of /.well-known/assetlinks.json where we could check for re-direct.. for /well-known/change-password, a valid implementation will always mean a re-direct with 'Location' request header.. and we can't filter sites like burberry as 'fetch' is not giving me access to header which are part of re-direct chain.
@rockeynebhwani The Response that fetch() returns has a redirected property. So something like this might work?
[well-known-change-password]
return fetch('/.well-known/change-password').then(function(r) {
if (r.status === 200 && r.redirected === true) {
return "true";
} else{
return "false";
};
});
This won't fix the issue apple.com issue for cross-origin redirects though.
@tomvangoethem - This will not work. As per specs
Servers should redirect HTTP requests for an originβs change password url to the actual page on which users may change their password by returning a response with a redirect status of 302, 303, or 307, and a Location header.
@rockeynebhwani checking only the location header and the response code isn't reliably telling if .well-known/change-password standard picks up. We should also deal with false positives where origins could return a redirect for arbitrary reasons (e.g.., redirecting users to a vanity host, incorrect referrers, wrong user-agents). On the other hand, we might still distinguish the inappropriate use of the well-known url for change password (e.g., r.redirected === false, r.status !== 200, or missing security measures to protect well-known resources).
@nrllh @AAgar @tomvangoethem for the two milestones overdue on July 27 could you check the boxes if:
Keeping the milestone checklist up to date helps us to see at a glance how all of the chapters are progressing. Thanks for helping us to stay on schedule!
@obto Aside from the commented concerns already brought up, it looks good to me. I can get to work on the transport security and cookies analysis this weekend.
I've updated the chapter metadata at the top of this issue to link to the public spreadsheet that will be used for this chapter's query results. The sheet serves 3 purposes:
@AAgar @tomvangoethem how is the analysis going? Bit concerned that there's no PR yet (even a draft one).
I've been quite busy with other things this month; I will make some catching up efforts the coming week
@bazzadp thanks for the pinging, @tomvangoethem @AAgar please let me know if you need any help/support, I'd like also try to contribute
@nrllh in case you missed it, we've adjusted the milestones to push the launch date back from November 9 to December 9. This gives all chapters exactly 7 weeks from now to wrap up the analysis, write a draft, get it reviewed, and submit it for publication. So the next milestone will be to complete the first draft by November 12.
However if you're still on schedule to be done by the original November 9 launch date we want you to know that this change doesn't mean your hard work was wasted, and that you'll get the privilege of being part of our "Early Access" launch.
Please see the link above for more info and reach out to @rviscomi or me if you have any questions or concerns about the timeline. We hope this change gives you a bit more breathing room to finish the chapter comfortably and we're excited to see it go live!
@AAgar @tomvangoethem @nrllh Looks like the queries have been written but not added to your chapters data sheet. Can one of you take care of adding the results to the sheet this week?
We are not done yet with the analysis. Once we're done, we can add the results to the sheet. @obto
Hi @AAgar @tomvangoethem , we plan to reuse some of your data in Privacy chapter (cookie attributes in particular). Please let me know if I can help visualise it asap.
Hey @max-ostapenko, I haven't gotten to the visualisation yet :-/ I guess it's fine to have the same figures in different chapters - assuming the interpretation will still be different?
@max-ostapenko I added the results of the cookie attributes to the sheets; you can check the visualisations here. Let me know if you have any comments/suggestions!
Most helpful comment
I'd like to volunteer as an analyst. I've used HTTPArchive in some of my (academic) research, so I have some familiarity with the datasets.