For example entering *.fakesubdomain.mozzilla.org will disable mozilla.org.
By "disabled", do you mean "whitelisted"? Want to make sure I'm in the right spot
Correct, this is about the feature introduced in #624. I guess the problem is that the wildcard shouldn't apply to the whole base domain, just the subdomain to which the wildcard was attached.
I'd like to take a look at this issue if no one else is working on it.
Sounds good, let me know if you have any questions. I might take a while to respond over the next week though.
I want to make sure I understand the codebase and problem correctly:
User disables PB on *.fakesubdomain.mozilla.org
User wants PB enabled on mozilla.org but the wildcard filter catches it
Based on my investigation, the current fix/logic in #624 produces a false positive with this snippet:
if(getBaseDomain(site) === getBaseDomain(origin)){
return false;
}
getBaseDomain(*.fakesubdomain.mozilla.org) // mozilla.org
getBaseDomain(mozilla.org) // mozilla.org
This issue is a bit beyond me as a newbie, but my first instinct is to check the wildcard's first domain (split on ".") and compare it to the origin's first domain. (i.e. Does "mozilla" match "fakesubdomain"? No, therefore PB is enabled on "mozilla.org")
Additional guidance is much appreciated.
A good first step would be to add a bunch of wildcard-testing scenarios to isPrivacyBadgerEnabled unit tests in src/tests/tests/utils.js. Some of these would pass (basic wildcard functionality), and some would fail (the scenario in this issue). The tests will help us be confident we are not introducing unwanted changes when we update the code.
It doesn't seem like we should be using getBaseDomain for wildcard comparisons ... The base domain of somesite.googlecode.com is somesite.googlecode.com since googlecode.com is a Public Suffix List top-level domain (TLD). This means whitelisting *.googlecode.com does not match somesite.googlecode.com with our current code, but I think it should. Let's add an assertion for and fix this scenario as well.
I'm thinking for wildcard entries, we want to see if the origin we are testing ends with the whitelisted wildcard domain, minus the wildcard. So, if *.mail.example.com is whitelisted and we are on www.example.com, does www.example.com end with .mail.example.com? No. If we are on web.mail.example.com though, yes. Just some quick thoughts, let me know what you think. Feel free to open a PR so that we can discuss specific code changes.
Be careful @tashachin, a naive implementation like the one @ghostwords describes in https://github.com/EFForg/privacybadger/issues/1519#issuecomment-398550020 would be linear in the size of the whitelist. For large whitelists, and many checks, this could have a real performance impact.
For a constant time solution, you could use a suffixtree. I wrote this implementation that I use in Privacy Possum.
Replacing getBaseDomain with string lookups will incidentally improve performance. There is no need to bring in new data structures here, not yet anyway.
@ghostwords In your comment https://github.com/EFForg/privacybadger/issues/1519#issuecomment-398544910 you suggested that I create unit tests. I wanted to run the pre-existing tests first to familiarize myself with QUnit.
However, when I tried to run the tests in-browser, I came upon an issue. When I try to replace /skin/options.html with /tests/index.html, I get a "File not found" error:

Am I missing something or should I open a new issue?
You have to load the extension from source in developer mode first. We don't ship unit tests with the version we publish on Chrome Web Store.
I fixed the docs for running unit tests in 7bde99250440de3e53acc306fddce397ef6a0bd8.
Most helpful comment
I fixed the docs for running unit tests in 7bde99250440de3e53acc306fddce397ef6a0bd8.