Privacybadger: Include more heuristics

Created on 14 Dec 2017 · 13Comments · Source: EFForg/privacybadger

Could anything else could train PB to be more aggressive? Like if Firefox canvas protection warnings or first party isolation are triggered assign a higher weight to block that domain. Or use of certain web standards. Or if 3rd party scripts and fonts are used?

As is, Privacy Badger still feels like it takes a relaxed approach to blocking. For instance, I turned off all other tracking protection features in Firefox, disabled any ad/analytic blockers and opened my 80 or so bookmarks, then went to reddit and opened another 50 posted links to "prime the pump" of Privacy Badger.

*It would be nice to see total counts of reds/greens/yellows on the Tracking Domains tab
*It would be nice to see a hit count of each site's entry on the Tracking Domains tab

If you're going to stop recording non-tracking domains it's going to list less things to block. If disabling checking a web page against EFF's DNT policy is boosts performance, what does that actually do? Does it make some sort of network request to the EFF to check something? Can that be rolled into a local detection instead then?

DNT policy enhancement heuristic performance question task ui

Source

jawz101

👍1

Most helpful comment

For what it's worth I just did a few tests comparing Firefox with different settings & extensions.
loaded 25 sites at least 5 times each and cleared cache between tests.

Privacy Badger with a trained profile:
privacy_badger

Tracking Protection (basic list) built-in Firefox feature
tracking_protection_basic

Tracking Protection (strict list) built-in Firefox feature + disable dns prefetching=true, network.predictor.enabled=false
tracking_protection_strict

uBlock Origin with a few changes of which lists to use and cosmetic filtering disabled
ublock_mylists_nocosmetics

uBlock setup same as above but with the "medium mode" and I unbroke a few sites as I went. Medium mode is disabling 3rd party scripts and frames- so I had to go back and allow a few common things to get media to show up.
ublock_mylists_nocosmetics_medium_mode

jawz101 on 14 Dec 2017

👍2

All 13 comments

There are a bunch of things going on in this issue, which is fine, but I suggest filing targeted follow-up issues, a separate issue for each specific suggestion, after our conversation here.

ghostwords on 14 Dec 2017

Tweaking the way our heuristic works to detect and prevent tracking more quickly: Good idea, and something we should work on once we get existing heuristics to a more stable place. For example, we seem to have trouble learning to block Google Analytics (#367), the most common third-party tracker. I would say tweaks and improvements will have to come after serious bug fixes.

ghostwords on 14 Dec 2017

Adding interesting statistics to the options page. Yes! Excellent idea.

ghostwords on 14 Dec 2017

Regarding #1795, no longer recording non-tracking domains will not change what gets shown in the popup nor the options page. Tracking Domains on the options page already doesn't list non-tracking domains. The popup will continue displaying what it displays now the way it displays it now.

ghostwords on 14 Dec 2017

Checking if domains comply with EFF's Do Not Track policy makes requests to check for presence of /.well-known/dnt-policy.txt. For more information, see EFF's Do Not Track (DNT) Policy guide.

Making these requests comes with overhead. We worked and will continue working on reducing this overhead. For example, #1795 will help by no longer issuing these requests to non-tracking domains.

ghostwords on 14 Dec 2017

For what it's worth I just did a few tests comparing Firefox with different settings & extensions.
loaded 25 sites at least 5 times each and cleared cache between tests.

Privacy Badger with a trained profile:
privacy_badger

Tracking Protection (basic list) built-in Firefox feature
tracking_protection_basic

Tracking Protection (strict list) built-in Firefox feature + disable dns prefetching=true, network.predictor.enabled=false
tracking_protection_strict

uBlock Origin with a few changes of which lists to use and cosmetic filtering disabled
ublock_mylists_nocosmetics

jawz101 on 14 Dec 2017

👍2

tested with 25 sites I knew or guessed would be turds. Notice the disconnects in the last shot. The only thing in Firefox Lightbeam that linked a couple sites were ads.twitter.com and trbas.com (LA Times and Chicago Tribune won't display images w/o trbas.com)

http://www.androidpolice.com/
https://www.aol.com/
http://www.avsforum.com/
http://www.chicagotribune.com/
http://www.cnn.com/
https://www.merriam-webster.com/
http://www.foxnews.com/
https://www.huffingtonpost.com/
http://www.imdb.com/
https://lifehacker.com/
http://www.latimes.com/
https://www.msn.com/
https://www.theguardian.com/us
https://www.pcworld.com/
https://www.cnet.com/
https://www.bible.com/
https://www.snopes.com/
https://sourceforge.net/
https://www.nytimes.com/
http://time.com/
http://www.tmz.com/
http://www.tomshardware.com/
https://www.usatoday.com/
https://www.vice.com/en_us
https://www.washingtonpost.com/

jawz101 on 14 Dec 2017

2114 is a concrete way we could get started on enhancing tracker detection.

ghostwords on 25 Jul 2018

@ghostwords @bcyphers

I don't want to muddy up your https://github.com/EFForg/privacybadger/issues/2114 but I wanted to run a couple of utilities by you and a question.

Question 1st- would it be possible or meaningful to factor in SSL certificate information? I've been curious if some CA's are more malware friendly than others or if, say, one domain gets blocked, all future domains belonging to the same organization are assigned a higher weight in the heuristic.

If anything, it's just my general curiousity to see if SSL certs reveal anything about the sorts of tracking companies. Since most SSL certs come with a cost, I would think they don't invest much money in separate certs for each of their domains/subdomains so it might be a way to establish equivalency amongst domains.

jawz101 on 25 Jul 2018

oh, and the utilities. Have you seen PyFunceble and OpenWPM?
https://funilrys.github.io/PyFunceble/
https://github.com/funilrys/PyFunceble

https://webtap.princeton.edu
https://github.com/citp/OpenWPM

jawz101 on 25 Jul 2018

Thanks for the pointers as always! I opened https://github.com/EFForg/badger-sett/issues/21 to investigate using PyFunceble as an easy way to speed up our crawler. We are fans of OpenWPM and the research papers it helps produce.

ghostwords on 25 Jul 2018

SSL certs: Looks like there is a tlsInfo webRequest-extending API coming up in Firefox 62 (and Chrome?) that lets WebExtensions inspect certificate details. So we could see what useful/interesting information we could get from certificates at some point. My feeling is there is plenty of lower-hanging fruit elsewhere in terms of improving our detection techniques, but I dunno, it's worth checking out. Feel free to open a new issue!

ghostwords on 25 Jul 2018

lol I just like bending your ear when I have these brain farts. If there's a mailing list I'd be glad to throw my random thoughts in there :)

jawz101 on 25 Jul 2018

Was this page helpful?

0 / 5 - 0 ratings