Brave-browser: Prevent tracking based on link decoration via query string or fragment

Created on 25 Apr 2019  路  17Comments  路  Source: brave/brave-browser

ITP 2.2 is reducing the lifetime of cookies set via document.cookie when the navigation came from a tracking-enabled page and the destination URL includes query string parameters or a fragment: https://webkit.org/blog/8828/intelligent-tracking-prevention-2-2/

We already block the third-party scripts that would be extracting these IDs and setting a first-party tracking cookie, but we could in theory go further by:

  • emulating the cookie lifetime restriction, or
  • stripping out tracking query string parameters (e.g. gclid, fbclid, msclkid and mc_eid).
QA Pass-Linux QA Pass-Win64 QA Pass-macOS QYes featurshields prioritP3 privacquery-filter privactracking release-noteinclude

All 17 comments

@snyderp found a comprehensive list of tracking parameters in https://greasyfork.org/en/scripts/10096-general-url-cleaner.

A couple of questions / comments, only focusing on link decoration:

  1. Would this only apply for the sites listed in the greasyfork link above, or other domains?

  2. Re: YT in the greasyfork link, we should test to make sure blocking the prefetch doesn't break consecutive video playback.

  3. If we are already blocking 3rd parties that would profile data in conjunction with URL decoration, I am not clear on what the harm is in preventing the 1st party from using their own server logs to determine what their audience interests are, using link decorations. If the links aren't passing personal or identifiable information (given the scope/context of protection we have in place), it seems like we are removing a feature that they might leverage in the 1p context in a way that doesn't necessarily violate our privacy promises with our users.

I could be missing something, but here are the reasons why I am asking:

  1. With Brave Ads, we have some advertisers including query string params to help determine which traffic they receive via Brave Ads. Given that we hide behind the Chrome UA, there are few ways in which advertisers and publishers can determine whether our reporting aligns with theirs, until we have an Apollo-phase source of truth.

  2. If a publisher has a 1p relationship with an auth'd user, and uses link decorations as a means of optimizing or customizing content that is presented for the user, or other services used in the website, removing the decorations may break intended 1p:1p engagement behavior.

Of course, not trying to talk anyone into not providing better tracking protection, but the above items came to mind and I want to check in here to see if they were being factored in for potential impact.

@lukemulks the suggestion is not to remove all query string params, just those used specifically for tracking purposes. The ones in the link above would be a good starting point, but the list could grow or shrink depending on our boldness, measurement results, etc. So the worry is less ?likes=shoes but more facebook_id=<something>, that sort of thing

FWIW, the Safari ITP approach is to block all query params set by known / labeled tracking domains. So in some senses more aggressive, some senses less.

So I think the suggestion would steer clear of the concerns you mentioned, and that if we interfered with the use cases you mentioned, that'd be in most (if not all) cases a bug. WDYT?

I'm so late in the game on this thread @snyderp, apologies; to answer your question, it sounds good to me. Thank you for addressing the concerns, and explaining the context clearly in your response.

Verification passed on

Brave | 0.72.112 Chromium: 78.0.3904.70聽(Official Build)聽dev聽(64-bit)
-- | --
Revision | edb9c9f3de0247fd912a77b7f6cae7447f6d3ad5-refs/branch-heads/3904@{#800}
OS | Ubuntu 18.04 LTS

Verified test plan from https://github.com/brave/brave-core/pull/3239

Verified passed with

Brave | 1.1.1 Chromium: 78.0.3904.97聽(Official Build)聽beta聽(64-bit)
-- | --
Revision | 021b9028c246d820be17a10e5b393ee90f41375e-refs/branch-heads/3904@{#859}
OS | macOS Version 10.13.6 (Build 17G5019)

Verification passed on

Brave | 1.1.1 Chromium: 78.0.3904.97聽(Official Build)聽beta聽(64-bit)
-- | --
Revision | 021b9028c246d820be17a10e5b393ee90f41375e-refs/branch-heads/3904@{#859}
OS | Windows聽10 OS Version 1803 (Build 17134.1006)

This is an often-overlooked form of tracking, so good job deciding to add this to the browser!
Though, from what I can tell (please correct me if I'm wrong), the implementation you've went with is currently extremely narrow in scope - whereas this Issue at least appears to have been intended to be general in purpose (but has been closed with the posting of the mentioned narrow implementation), and the tiny description of this feature in the release notes communicates a general, even potentially comprehensive solution, as well. An accurate description would mention that only a select few query parameters (gclid, fbclid, msclkidand mc_eid) are handled, out of the many other ones known to be used for tracking that are commonly used across the web.

At any case, if you wish to actually implement a solution for the type of tracking in this Issue's title for real, as was alluded to in this thread, many solutions exist that are comprehensive (for example, the ClearURLs extension for Chrome/Firefox, and their code or lists of used parameter filters are publicly viewable.

@Vagmer gotta crawl before you walk ;) We're addressing what seem to be the heaviest hitters now, and can scale up as we gain confidence we're not busting things for users.

That additional set of tracking-related query parameters looks very interesting, thank you for linking! From eyeballing though, it looks like at least some may be used for purely 1p purposes, which we don't target. More generally though, this list seems to address a site tracking a user, once the user lands on that site (e.g. how a user got to amazon.com), when the bigger concern (from our end) is people using query parameters to track users across a large portion on the web (e.g. social embeds and similar getting known query params across all sites). Do you know if there is a similar, expanded list that targets that second problem?

@snyderp:

gotta crawl before you walk ;) We're addressing what seem to be the heaviest hitters now, and can scale up as we gain confidence we're not busting things for users.

Oh, definitely makes sense. I can understand and agree with that approach, it just struck me that both the immediate closure of this issue and the (inaccurate) inclusion of this as a general feature in the release notes seem to signal that this was considered done with.

That additional set of tracking-related query parameters looks very interesting, thank you for linking! From eyeballing though, it looks like at least some may be used for purely 1p purposes, which we don't target. [...]

That extension and its rules are expansive and they fulfill more than a singular purpose that fits under cleaning URLs, so that wouldn't be surprising... It strips various tracking parameters, other "junk" or extraneous parameters, even skips intermediate redirection URLs/pages, etc... It also endeavors to include exclusions or otherwise shape rules to avoid the rare associated breakage. Personally, I've faced no issues with it, though occasionally such breakages are fixed after user reports.

Do you know if there is a similar, expanded list that targets that second problem?

That list includes the ubiquitous ones as well (such as utm_* parameters). Unfortunately, I don't know of a specialized or more descriptive list. Maybe the dev of that extension or its repo hold one. I know that there are many many more extensions (or userscripts) with the exact same purpose (there's an incomplete listing on ClearURLs's wiki, and elsewhere), though. The one I mentioned just seems to be the most extensive and advanced one that I'd come across.

Is this configurable by Shields or enabled for everyone?

@Madis0 should be fixed for everyone 馃憤 No shields configuration needed
cc: @fmarier

FWIW I think this behaviour should be disabled when shields are down for a site.

FWIW I think this behaviour should be disabled when shields are down for a site.

@Bonemeijer Have you found any breakage related to this?

"Shields down" is an webcompat-related toggle and I'm not aware of any compatibility problems with this protection.

If not anything else, it could confuse web developers using Brave.

@fmarier I noticed that for sites which I'm working on, Brave removes gclid parameter from querystrings. Which I think is good. However, this behaviour persists even when shields are down for that site. This also happens when the url is entered manually - so the request does not originate from an shields-up location.

You can try it yourself by

  • opening any website, ie. google.com
  • click the brave icon and choose "shields down"
  • append the following querystring to the url ?gclid=1
  • and notice how the gclid parameter disappears

Now, this might be expected behaviour according to how it is programmed. But as an end-user, I would expect that the "shields down" functionality for a location would halt any blocking that might be done for that specific location. As an end-user who is also a webdeveloper I might even expect Brave sending it's own user-agent string.

Not all of Brave's protections can be disabled via Shields. If we determine that a protection doesn't have any negative impact on our users, we don't necessarily provide a toggle. I can see how it can be surprising for developers who aren't expecting this behavior.

Tying this feature to the toggle is certainly something we would consider for this feature if we discovered problems affecting our users.

Sounds fair enough.

Without knowing the full philosophy and background of the Brave project, as an end user I would expect that "shields down" means "I trust this site, allow them to show ads and gather statistics". And I would expect any alterations to the url or querystring would be included in that.

Now I know of the behaviour, I know I have to work around it by using another browser. But it did have me chasing my own tail for a minute.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

bsclifton picture bsclifton  路  3Comments

bsclifton picture bsclifton  路  3Comments

bbondy picture bbondy  路  3Comments

Sondro picture Sondro  路  3Comments

kjozwiak picture kjozwiak  路  3Comments