Https-everywhere: How involved should the ruleset review process be?

Created on 13 Sep 2016  路  12Comments  路  Source: EFForg/https-everywhere

As a ruleset reviewer, I'm finding that most pull requests for "institutional" sites are usually very incomplete. My usual experience is that I do the Google search trick with the included domains and find sometimes dozens of missed sites, which I then write up as items to be completed. For example:

  1. fnal.gov, government research lab
  2. migros.ch, retailer
  3. th-nuernberg.de, school
  4. udn.com, news site

This has several problems. It takes time. There's no guarantee the contributor will even do the changes, which means my time might have been wasted. I imagine it's an unpleasant surprise to the contributor to discover all this extra work they need to do. For example in (3), the contributor was solving an issue someone else reported that was simply "Add https://www.th-nuernberg.de ", and now I'm asking them to add domains like https://bwc188113.bw.fh-nuernberg.de . Also, in (4), most of the domains to add were HTTPS failures anyway.

Ideally we want to encourage contributors to add as many domains as possible, especially if it's a new ruleset or they are doing a major update. But my current strategy seems too extreme. I think going forward what I'll do is, for new rulesets or major updates, if search trick reveals only a small number of missing domains, I'll note them specifically. If it's a lot of missing domains, I'll ask them to please add more and give them the search to get started. On the second review I'll just work with what they have and not ask them to add any more. And for pull requests that are either targeted fixes or adding a small number of domains, I'll just review what they've given.

@fuglede @Hainish @J0WI What do you all think?

question

All 12 comments

I agree that we should not be that strict for bug fixes or small additions.
For new or rebased rules I think we should try to have full coverage, because we are HTTPS _Everywhere_, so we won't secure users only on ^ and www. urls. It also helps us to find bugs, especially for wildcard rules.

I sometimes make an exception when hosts are to large to check all domains, e.g. governments and universities or when there are just a few hosts left that seems obsolete anyway. In some cases the site offers a page where you find a list with links to all related sections, companies or whatever and you can easily grep them out of the list a use a script to check for cert errors.

I'm not concerned about small sites where maybe the contributor missed forum. or wiki.. That's no big deal. It's the government/university-sized sites that I'm really asking about here.

FYI: This comment I made on pull request #6681 describes some of my review strategy.

As much as I'll commend @J0WI's workflow, I'm generally the opposite and don't bother trying to hunt down all existing subdomains; in most cases, the typical user will only come across the top level domains, and finding and testing them all is rather time-consuming, so I generally consider it more value-adding to be able to use that time to go through more entries in the backlog.

No rule without exceptions though, and if a subdomain plays a significant role for a given site, I might suggest adding it. Something I've done a couple of times for edge cases -- and which is probably more sensible for older PRs that are unlikely to be updated, than it is for new ones -- is to merge a PR but note that there are some subdomains that could be added in a follow-up PR if the contributor considers it worthwhile.

Smart HTTPS
HTTPS by default
ConsistentHTTPS

In my opinion, If I realy want to 鈥滺TTPS Everywhere鈥滐紝I would use them above.
Sites will change their settings again and again. We should not waste too much time for the HTTPS failures or obsolete hosts.

I agree that listing broken sites in a comment can be a lot of work and I'm not really sure why we do it. udn.com is an especially bad example. I guess the main reason is that it helps writers and reviewers organize their list. If I identify 10 new subdomains, and I'm testing them all anyway, then I might as well note the five that don't work along with the five that do.

I don't think we need to specifically break down why certificates are invalid though, between self-signed and mismatch and expired and anything else. Invalid certificate covers it.

@renyouguo HTTPS Everywhere is the only HTTPS tool that goes into Tor, and it's highly discouraged for Tor users to install other add-ons. So HTTPS Everywhere shouldn't assume users are free to install other add-ons for extra functionality.

@jeremyn Sorry for my poor English. I don't mean users install other addons. What I want to say is: we can cover more sites by the time waste in the subdomains which don't work.
List all of them should be worked by tools, if it's realy important.

While I was reviewing #7488, I found that the comments would be hundred times larger than the actual target list if I asked the contributor to add all of problematic subdomains.

Most of subdomains in this case are just redirecting to www. I tried to summarize them, but some subdomains don't seem to have an intuitive rule
(e.g. ab-berlin.profiseller.de => http://www.profiseller.de/shop1/mega/index.php3?shop=1&ps_id=P17814362)

I have no idea how to progress this, should I just merge it as-is?

Welcome to being a reviewer, @gloomy-ghost .

It looks like those subdomains are mostly referral sites to the main store, is that right? In that case I wouldn't list them individually, especially if they don't work. You might add a comment to the top describing the situation without listing all the specific sites.

@gloomy-ghost it's also the case that the comments are not included in the extension itself. I think it's acceptable to add long comments since the only ones downloading this is devs, and it won't add any to the download size for the end user.

I disagree with adding long comments by default. Comments add a maintenance burden going forward, for example see https://github.com/EFForg/https-everywhere/pull/7344 , which is an open pull request that's stalled because of a disagreement over what to do about confusing, outdated comments related to a complex exclusion.

In this current pull request, a long, specific comment means that anyone doing maintenance on this ruleset will be obligated to either review 250+ non-working subdomains, or let the comment grow out of date and increasingly difficult to manage.

This issue has been open long enough and there's no active discussion, so I'm closing it.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

apple-web-evangelist picture apple-web-evangelist  路  4Comments

cschanaj picture cschanaj  路  4Comments

a0193143 picture a0193143  路  4Comments

diracdeltas picture diracdeltas  路  3Comments

anonsubmitter picture anonsubmitter  路  5Comments