Unlike the platform attribute, there is not any grammatical requirement on the default_offattribute, This make the process of performing automated check to re-activate disable ruleset difficult.
To illustrate the situation, we have mismatch, mismatches, mismatched and their capitalization variants for the mismatched error. These variants make it hard for contributors to perform audit. See #9582, #9842
A particularly bad example could be BufferedIO.xml. It should be a platform=cacert ruleset, but the message is given in default_off. (It is a mismatch now)
The goal is to standardize the default_off attributes and to enforce the grammar with relaxng.xml such that automated audit can be done easier.
Modify default_off attributes in the existing ruleset to an agreed set of keyword, add the ruleset to ruleset-coverage-whitelist.txt if necessary. See https://github.com/EFForg/https-everywhere/pull/9884.
Update CONTRIBUTING.md to explain the keyword we use.
Enforce the grammar with relaxng.xml.
Each keyword should be separated by a comma and a space
(keyword1|keyword2)(, (keyword1|keyword2))*
Some rule use ' (single quote) instead of " (double quote), it will be great if you can do a replacement as wel; :)
List of keywords (feel free to suggest new ones)
regional
refused
timeout
cert-algo
cert-chain // curl: (60) SSL certificate problem: unable to get local issuer certificate
expired
self-signed
mismatched
ssl-error // reset, or any other ssl error not stated above
status-unexpected // 200 Error pages
status-others // other than 200, i.e. 4xx/5xx
loops // secure connection redirects to plaintext
content-different // visual/ contextual difference, also for different status code
breaks-site // in a board sense.
breaks-third-parties
cors // CORS issues
ruleset-test-failed // failed ruleset test (avoid space within keywords)
request-owner
request-user
others // details should be stated in the ruleset comment
Updated 2017.06.28 Please refer to default-off-attributes.txt
Aren't cert-chain and cert-local-issuer the same thing?
Certificates are invalidNo HTTPS support anymoreSite does not support HTTPS currently@Bisaloo You're right, I will modify the list according.
mismatched and status-mismatched?SSL_ERROR_NO_CYPHER_OVERLAP or SSL_ERROR_RX_RECORD_TOO_LONG like for urlquery.net)?@Bisaloo
mismatched refers to a certificate mismatched error.
status-mismatched refers to the situation where HTTP and HTTPS response clients with different HTTP status code, like HTTP (200) and HTTPS (404 with valid certificates). The problem is that it breaks Travis build and requires some test url. Now I think it should be merged with ruleset-test-failed, do you have any preference or suggestion?
I would like a ssl-error keyword to be used in the situation you mentioned (and any other complex SSL errors not stated above) because these errors make little difference to the ruleset auditing process (please correct me if I am wrong).
Remark
audit-required from the list of keyword because it is too ambiguous. status-mismatched refers to the situation where HTTP and HTTPS response clients with different HTTP status code, like HTTP (200) and HTTPS (404 with valid certificates). The problem is that it breaks Travis build and requires some test url. Now I think it should be merged with ruleset-test-failed, do you have any preference or suggestion?
I see. I would put that as content-different.
I would like a ssl-error keyword to be used in the situation you mentioned (and any other complex SSL errors not stated above) because these errors make little difference to the ruleset auditing process (please correct me if I am wrong).
No, I agree. I think we can also merge reset with ssl-error.
Alright. Let's put Different HTTP status code as content-different and reset as ssl-error.
Updated I think that we will need add a section for the default_off attributes in CONTRIBUTING.md when we are closed to resolving this issues.
I'm not really happy with default_off="regional". Maybe a user wants to enable/disable only the rules for his own region, but not for other regions.
EDIT: Sorry, accidentally closed.
What exactly is the "ruleset auditing process" mentioned in https://github.com/EFForg/https-everywhere/issues/9906#issuecomment-301301023? The specific process should make a difference in what keywords we want. (And if it doesn't make a difference, then what's the point of this issue anyway?)
I'm not sure what regional is for. I guess it's for the "only available in China" problems? Chinese sites are a special case due to the Great Firewall. We should get @gloomy-ghost 's and @ivysrono 's input on what they would like to see for that, though I know they are both busy with other things right now.
We should have only a small number of categories. Each extra category is one more thing to get confused about and one more potential argument to have with contributors.
For a while now in reviewing rulesets I've preferred invalid certificate to a more specific categorization in ruleset top comments. Here I would combine expired, self-signed, mismatched, cert-chain, and cert-algo into invalid-certificate.
I think other is slightly better wording than others.
Small, infrequent problems l should be moved into other. For example if there are only a few CORS rulesets, then cors should be included under other.
I don't know what request-owner and request-user mean.
We might want to allow extra attributes, such as date-expired="2017-05-14" for invalid-certificate.
Without knowing more about what an audit looks like, I would prefer just these keywords: regional (to cover "only available in/always breaks in X region"), refused, timeout, invalid-certificate, other, and possibly cors if there are a lot of default_off for those.
I just want to add that I think it's great people are working on automating the review process :smiley:
@jeremyn thank you for joining the discussion!
What exactly is the "ruleset auditing process" mentioned in #9906 (comment)?
To me, the audit process means to re-activate (or delete) the default_off rulesets if possible (either automatically/ manually). We have around 4k default_off ruleset and quite a number of them have not been re-activate for years. When a cleanup is necessary, the huge number of variants of the default_off make the process troublesome.
In particular, a contributor might want to focus on a specific type of default_off, e.g. an expired is much more likely to be fixed by web-master than a refused. And a cert-chain might require contributor to use a new Firefox profile. Without a consistent default_off message, a large portion of time will be spent on searching for ruleset.
I'm not sure what regional is for. I guess it's for the "only available in China" problems?
Yes. There is only 5 files will be applying the regional keyword AFAIK. See https://github.com/EFForg/https-everywhere/pull/9929. @gloomy-ghost, @ivysrono please let me know what do you prefer.
We should have only a small number of categories.
I agree that we should have fewer categories. Suppose that we have an automated process since they are basically the same to a robot (or a script), but I am not sure if this will affect manual audit or not. (I use a bot to re-generate a ruleset for a default_off host almost every time I would like to perform an audit and I do create default_off ruleset myself). :)
I am OK with expired, self-signed, mismatched, cert-chain and cert-algo merged into the category cert-invalid if there is no problem with that.
I don't know what request-owner and request-user mean.
request-owner mean web-master/ site owner want to have the ruleset disabled, like https://github.com/EFForg/https-everywhere/pull/9528
request-user https-everywhere user reported that the ruleset is causing problems, like https://github.com/EFForg/https-everywhere/issues/9371
We might want to allow extra attributes, such as date-expired="2017-05-14" for invalid-certificate.
If adding extra attributes is an option, I would really love to see an date-expired attribute. Beside, we might want to have something like phase-out="2022" (YYYY+5) to schedule the removal of default_off ruleset.
I would prefer just these keywords: regional (to cover "only available in/always breaks in X region"), refused, timeout, invalid-certificate, other, and possibly cors if there are a lot of default_off for those.
Which category should failed ruleset test go into in this case? We have 1.7k such ruleset (nearly half of the the default_off
@cschanaj I get the gist of what the auditing process is -- someone does a manual or semi-automated search over the rulesets looking for categories of default_off -- but the specific process/program/script used and its goals matter. For example if we want a detailed analysis then we should use fine-grained categories, but if we are trying to quickly identify "junk" rulesets then maybe broad categories are better.
Perhaps more compelling is that I don't like the idea of the hassle of constraining default_off values to support an auditing program that we can't see and have no input into. We don't want to have to justify these categories to contributors on the sole basis of "@cschanaj says this helps them with their auditing script" (no offense intended to you). Even worse is imagine in five years, suppose none of us here are involved with the project any more, and the future maintainers see the categories in the XML grammar but have no idea why they are there and what may break if they change them or if they add new categories. It introduces a dependency that we can't publicly see.
For regional, maybe we should mention China (gfw-of-china-issues or something) specifically since it is so far the only region that has special issues.
request-owner could instead be at-website-admin-request, so that default_off="at-website-admin-request" reads in a nice way.
request-user is not a valid category, we should only disable a ruleset if there is a verifiable reason or the site owner requests it. We wouldn't say default_off="someone-reported-a-problem".
I disagree with a phase-out attribute, we don't need to include a specific policy like a five-year grace period in the ruleset. Also doing even that simple math by hand would be tedious and error-prone. People doing the later analysis can decide if a ruleset should be disabled based on policy or common sense.
failed ruleset test is also not a valid category, it basically means broken. Since you say there are 1.7k of these rulesets then unfortunately unless a robust automated review program appears, no one is going to systematically review them all, so I suggest we mark them as unknown-legacy or something as a one-off. Future maintainers can delete them as a group if they become too burdensome to keep for some reason.
@Hainish This issue is getting slightly complicated and has future maintenance implications, so you may want to review this and provide feedback.
Please note that we are displaying the value of default_off as a part of UI on Chrome:

Also, I agree with @J0WI 's comment at https://github.com/EFForg/https-everywhere/issues/9906#issuecomment-301322042. The comment of a regional issue should be more specific so that a user can determine if it's working or not before enabling it. (btw, can we automatically enable all rulesets for a region per user settings?)
For a while now in reviewing rulesets I've preferred invalid certificate to a more specific categorization in ruleset top comments. Here I would combine expired, self-signed, mismatched, cert-chain, and cert-algo into invalid-certificate.
Without knowing more about what an audit looks like, I would prefer just these keywords: regional (to cover "only available in/always breaks in X region"), refused, timeout, invalid-certificate, other, and possibly cors if there are a lot of default_off for those.
I think it does make sense to use the same categorization as in the top comment (although I would still keep more than what you suggested, like different-content).
I would keep certificate chain issues separate though as it is easy to overlook it and it requires extra care (those subdomains always work on my default firefox profile).
I did not aware that the default_off attribute will be visible to Chrome users. In this case, we should try to keep the default_off simple to avoid confusing them.
What do you think about the following set of keywords? I am ok having refused and timeout merged into something like connection-failed. Similarly. with cors merged into broken.
regional // let's use this temporarily
refused
timeout
ssl-error
cert-chain // curl: (60) SSL certificate problem: unable to get local issuer certificate
cert-invalid // any other certificate problem
cors // CORS issues
broken // both break-site/ break-third-parties, different-content, failed ruleset test
at-website-admin-request
other // details should be stated in the ruleset comment
As @jeremyn mentioned, the biggest challenge we have is the lack of any automated reviewing tools available. I have no idea how this can be solved in the near term.
btw, can we automatically enable all rulesets for a region per user settings?
If this is possible, maybe we can simply get ride of regional.
I agree with @gloomy-ghost
I set 5 user_pref for Only avaiable in China now, do you support only one pref to enable them all?
BTW, I'd like to enable all mixcontent rules by one click/pref because of https://github.com/ivysrono/UpgradeMixedContentBlacklist / #8506
Does upgrade-insecure-requests resolves cors issues as well?
@cschanaj Sometime, yes.
Certificate chain problems can be noted in the top comment. I think it's better to merge cert-chain into cert-invalid.
I may not have been clear in my earlier comment https://github.com/EFForg/https-everywhere/issues/9906#issuecomment-301363731, but I don't think we should have broken either. I prefer unknown-legacy for the old 1.7k -- the wording is awkward, I know, suggestions welcome -- and future broken/failed ruleset test should be other.
@jeremyn How about using legacy for those 1.7k old ruleset?
regional // let's use this temporarily
refused
timeout
ssl-error
cert-invalid // any other certificate problem
cors // CORS issues
legacy // failed ruleset test
at-website-admin-request
other // details should be stated in the ruleset comment
Sure, that's fine. Thanks @cschanaj .
I should say, that's fine with me. We should really get @Hainish 's opinion before anyone does mass updates according to your list.
I'll take a look at this issue tomorrow.
For a while now in reviewing rulesets I've preferred invalid certificate to a more specific categorization in ruleset top comments. Here I would combine
expired,self-signed,mismatched,cert-chain, andcert-algointoinvalid-certificate.
I disagree in this point. For me they are not the same:
cert-algo just won't work in modern browsers. The pref to override this (in the case of SHA-1) will be removed soon.cert-chain is a minor issue and does not affect all users. If you are aware of this problem you can easily fix it yourself. Maybe we can turn this into a platform?expired is basically something the webmaster should be aware of. But an expired certificate is not per se insecure (see revoked).mismatched can be handled like a self-signed cert. But in most cases it's just dangerous and may allow doing nasty things with your session.self-signed can work like CAcert, but is not practicable for public web services.BTW, I'd like to enable all mixcontent rules by one click/pref because of https://github.com/ivysrono/UpgradeMixedContentBlacklist / #8506
extensions.https_everywhere.enable_mixed_rulesets = true (Firefox)
By default we should keep things simple, which means just cert-invalid. The different variations on cert-invalid are not technically the same but the question is whether the burden of having different categories is worth the trouble in the specific context of default_off values. Anyone who is writing an automated checker can distinguish between the variations based on the error reported by fetch_test.sh if they need to, so there's little advantage in making human editors use different values.
I should note that the discussion around browser prefs will be obsoleted in the transition to WebExtensions. Extensions built in this newer API will not be able to do pref introspection, see https://wiki.mozilla.org/WebExtensions/FAQ#Will_I_have_access_to_about:config_or_the_preferences.3F. This transition will happen in the next few months.
As far as the default_off string goes, since this is exposed in the WebExtensions UI, I'd be in favor of keeping it descriptive but concise. Without user testing, I'm unsure how many users ever see the default_off strings, but I would guess it isn't very many. Still, it is exposed and should be treated as user-facing - thanks @gloomy-ghost for pointing this out.
The comments in the rulesets are for maintainability only, and never make it into the actual extension, so we can't rely on any mechanism of scanning comments for in-extension functionality.
We already have the platform attribute which is underutilized, mainly to indicate which rules should be enabled when mixed content blocking is disabled. This attribute seems appropriate when indicating certain types of rules, such as those that should be enabled on platforms within the GFW. We could extend the grammar of platform so that it indicates a comma-separated set of values, which we can document within CONTRIBUTING.md. Using this attribute we can then parse those values and take appropriate in-extension action.
I'm generally in favor of having a high level of granularity when documenting the certificate errors, for the reason described here: https://github.com/EFForg/https-everywhere/issues/9906#issuecomment-301346982. Expired certs are more likely to be renewed quickly than, say, self-signed certs. But I will defer to the ruleset maintainers on what level of granularity is useful or appropriate here.
@Hainish Is it possible to override what the user sees from the tooltip hover? For example, we write cert-invalid but the user sees Invalid certificate. Keep in mind there are thousands of these rulesets so this basic level of abstraction is important. We don't want pull requests changing thousands of files to modify a line of descriptive text.
Out of all these categories we've come up with, only regional makes sense to me as a platform, and only then if we actually specify the region e.g. china. I disagree with @J0WI that cert-chain is a good platform. As I understand it, whether a specific user gets a cert chain error for a particular site depends on whether they have randomly visited another site that is correctly configured and serves the same intermediate certificate that the browser then caches. It doesn't make sense to enable all of these rulesets at once.
Sort of tangential but "comment maintenance" is a disproportionately large and tedious part of ruleset work. The comments are not standardized in either form or content and are not currently reviewable by automated testing. Recently a pull request was unmerged for almost two months because of an extremely minor comment problem that I waited for the contributor to make, until I finally merged it as-is and opened another PR for the minor change. At the same time I can't remember any case where a big list of broken subdomains made a real difference in maintaining a ruleset. It's a lot of work for little gain, not a pattern I want to see replicated here (or anywhere else).
We already have the platform attribute which is underutilized, mainly to indicate which rules should be enabled when mixed content blocking is disabled. This attribute seems appropriate when indicating certain types of rules, such as those that should be enabled on platforms within the GFW.
@Hainish I agree that we should have regional a platform if it is possible to activate them by default for china users behind the GFW. However, we lack a measure to tell the users to _disable-this-rule-when-you-are-on-a-trip_ since platform is not exposed to the users. Any advise/ solution to this?
Is it possible to override what the user sees from the tooltip hover? For example, we write cert-invalid but the user sees Invalid certificate.
@jeremyn I would like to see this enhancement happening as well, this improve readability of the default_off messages. Though I would like to noted that the default_off message is not visible to Firefox users AFAIK, so it will be a chromium-only feature. Any chance the tooltip hover (or something similar) will be available on Firefox also?
We don't want pull requests changing thousands of files to modify a line of descriptive text.
Apart from this issue, we might get exceptionally lengthy default_off if a domain is poorly configured, for example
<ruleset name="Foo.com" default_off="refused, mismatched, self-signed, cert-chain, cert-algo, ssl-error">
instead of
<ruleset name="Foo.com" default_off="refused, cert-invalid, ssl-error">
I do not see which category Ghostery_metrics_optout.xml should fall into. The only rule redirect d.ghostery.com (with a working HTTPS connection) to localhost.
P.S. Personally, I do not think this ruleset match the goal of _https-everywhere_ because it blocks connections rather than redirecting users to secure connections. Any suggestion?
BTW, #10282 #10288 #10292
There are just 2 ruleset that only available in China now.
Can we settle on a set of keywords to use in the default_off field? I am okay with whatever solution is chosen but I'd like to see this resolved.
We can't really start to work on the rulesets as long as we haven't decided which keywords to use and each day new rulesets with potentially new default_off reasons are pushed.
I don't see the urgency. No one is creating new default_off rulesets as far as I know.
For:
We can't really start to work on the rulesets
What does this mean?
@Bisaloo currently, I work on this issue by either re-activating the ruleset/ deleting the obsolete ruleset. IMHO, it is fine if someone create default_off ruleset as usual at the moment, _as long as it is not a massive update_ like #9884 . If you prefer some suggested keywords, https://github.com/EFForg/https-everywhere/issues/9906#issuecomment-301509445 should work well.
In fact, I did some simple statistics on this issue in #10378. It looks really good to me, so I do not think it is urgent to resolve this.
What about redirect loops? For example Strawpoll.me.xml.
@Bisaloo I think that other from https://github.com/EFForg/https-everywhere/issues/9906#issuecomment-301509445 is good enough for this.
Reduce CN only rule as much as possible
@zoracon Do you think this should be enforced by ruleset checker at some point?
@zoracon Do you think this should be enforced by ruleset checker at some point?
To be honest, not sure. I want to go over the ruleset checker at some point after I finally get the fetch test issues resolved. But closing this for now since it seems like this came to a decent conclusion.