User.js: cache cache cache

Created on 8 Sep 2018  路  26Comments  路  Source: arkenfox/user.js

I know I added this commit after :cat2: took over and spammed #436 to pieces ... so anyway .. new clean topic

So I was looking at this 7yr old post and was wondering about a few things

2.) Storing arbitrary data in the Last-Modified header

If you send a HTTP response header as follows:

Last-Modified: **UNIQUE_ID**

The browser will send back the UNIQUE_ID when you make the same request later on:

If-Modified-Since: **UNIQUE_ID**

I'm not sure of the process here. What exactly triggers the browser to check if a new resource is available? Isn't this meant to be (or limited to) a date/time? If a date/time can it be rounded? What happens when it's blank? Is this still a tracking threat?

Most helpful comment

@crssi and @Thorin-Oakenpants,

The reason you don't notice much difference between enabled and disabled memory cache is that Firefox actually doesn't use it very often. It mostly uses it when you use the Back and Forward navigation actions, the rest of the time it pretty much doesn't use it at all. Also, as I said in my first post, extensions can flush the memory cache forcibly to refresh their data, which means that it is entirely possible for you to have memory cache enabled and still never really make use of it, depending on the circumstances. I found this out very recently, to be honest. It turns out memory cache is generally not as well understood as disk cache (and kind of overrated, too).

TC is looking to be the ultimate tool IMO

I concur.

@Atavic OK, I didn't know what you were referring to. I know those as HTTP status codes.

All 26 comments

What exactly triggers the browser to check if a new resource is available?

I have yet to investigate the underlying mechanics more but, from what I've read/tested so far, in Firefox this usually happens when you reload a page (clicking the reload button or pressing F5), or when a script in the page tries to reload the page or some parts of it. If you instead reload a page by clicking the address bar and then pressing Enter, the browser uses the memory cache and it may not make even a single request. I found this out when I was trying to get Detect Cloudflare+ to also count the number of requests for resources that Firefox had cached previously.

Another thing that can trigger this is a particular function in the webRequest API that some extensions use: handlerBehaviourChanged(). It triggers this because if flushes the memory cache (at least according to documentation).

Isn't this meant to be (or limited to) a date/time?

Indeed. HTTP headers are standards, really. I'm guessing this depends on the browser's implementation/specification, so I would have to actually get off my ass and test this on Firefox to be sure, but at least according to documentation it only accepts date strings:

https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Last-Modified
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/If-Modified-Since

This means that Firefox, in theory, should not accept Last-Modified headers that look like: fas987fa9s87fa696f or whatever. Maybe it did accept such values back when that guy wrote that, but it shouldn't be true nowadays. However, I think it is still theoretically possible for servers to change the value of the header on every single request (like adding a second every time), which would basically turn the string into a unique ID while still being a valid date string. I never wrote a server application though, so there might be factors that I'm not aware of that make this impractical. I would have to investigate more, to be honest.

Yup, assumption was we were ignoring a hard reload (which would/should request everything anew)

edit:
Example (from MDN): Last-Modified: Wed, 21 Oct 2015 07:28:00 GMT

So in theory, we could easily round that with a function in say Header Editor (right?) - the question would be what breakage. It might be interesting to collect these values .. wonder if that could be automated easily to see what most sites do?

We are ignoring hard reloads, though. Those flush both memory AND disk cache, but I wasn't talking about those! I'm talking about simple reloads.

I'm not sure of the process here.

Header Fields.

What exactly triggers the browser to check if a new resource is available?

Cache-Control: https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9
Pragma: https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.32

Isn't this meant to be a date/time?

UNIQUE_ID is definitely a hack, not respecting the standard.

If a date/time can it be rounded?

It's based on the client OS date and time. Non-standard hacks do exist, RunAsDate comes to mind.

What happens when it's blank? Is this still a tracking threat?

Most probably the resource is reloaded, a void string is surely better than any UNIQUE_ID

This means that Firefox, in theory, should not accept Last-Modified headers that look like [...] whatever.

Firefox can't make use of that ID, but the server - maliciously tailored - uses that header for its own purposes.

we could easily round that with a function in say Header Editor (right?)

I could do it with some regex-fu, but I doubt its usefulness because we can't modify the outgoing If-Modified-Since header, we can only modify the incoming Last-Modified. If someone uses this to track you, they will disregard whatever If-Modified-Since your browser sends and force you to re-fetch the resource every time (sending you a new Last-Modified header), which still makes you unique (or almost unique) to them. Breakage on the other hand shouldn't be an issue at all, worst-case scenario you just end up re-fetching the resource almost as if it wasn't in your cache in the first place.

It might be interesting to collect these values .. wonder if that could be automated easily to see what most sites do?

It could be automated, but you would only collect data about the communications between the server and you, therefore you can't know for sure whether they're sending you a unique date or not that way (because you don't know what the server is sending to other people).

Firefox can't make use of that ID, but the server - maliciously tailored - uses that header for its own purposes.

I meant that when a server sends you non-standard Last-Modified headers, Firefox should not send the non-standard If-Modified-Since in subsequent requests. I'm sorry to say I was wrong, though. I just tested it (on FF 62) and it doesn't give a shit what the Last-Modified header says as long as it says something :man_facepalming:. This means the documentation at MDN is merely describing the standard, and this pair of headers is just as easy to use for tracking as ETag and If-None-Match.

So, what can we do about this? If we remove both ETag and Last-Modified unconditionally, we're re-fetching all GET requests every single time, which is worse than disabling the cache altogether. This is yet another good reason for using the Temporary Containers extension...

Something we could do via an extension is validate the value of the Last-Modified header to decide whether to drop it or not. If it follows the standard, it stays - if not, it goes. It wouldn't prevent servers from tracking us by sending us valid unique strings, though. Needs more consideration.

Great replies, we looked at the headers in client and server, but what about the HTTP Responses?

Here's a good text, from a company that has a free tool (Fiddler) that I use sometimes: it allows to prevent the issue on top by Pants.

Claustromaniac, you're right about ETag (also mentioned in the text linked above):

A client can only send a Conditional Validation request if it has a cached copy of a resource and that copy includes either a Last-Modified or ETag response header.

But why are you saying:

If we remove both ETag and Last-Modified unconditionally, we're re-fetching all GET requests every single time, which is worse than disabling the cache altogether.

It is better in this situation... we get closer to a live OS (without persistence between sessions) with these settings.

but what about the HTTP Responses?

What do you mean?

Here's a good text...

Good text indeed :+1: It summarises very well most of the important points relevant to this conversation.

But why are you saying...

I was thinking about two things at the time. The first was performance. If we choose to remove those two response headers instead of disabling disk & memory cache, we end up caching resources but we don't use the cache for almost anything. We waste processor time both caching and removing the headers, and we allocate RAM that we rarely get to use.

The second thing was that we need the onHeadersReceived event for removing headers, which is troublesome (edit: not for this purpose). If it were my intention to re-fetch everything every time, I'd rather disable cache completely instead of using an onHeadersReceived event handler that would conflict with other extensions.

Other than that, there isn't much difference AFAIK.

So, answering the 2 remaining questions in the OP, merely for completeness...

What happens when it's blank?

In my very short tests, it seemed Firefox sent the previous If-Modified-Since when that resource was cached in prior requests (while I wasn't removing the Last-Modified headers), the rest of the time it didn't send the If-Modified-Since. I'm not fully convinced about this test though, I should test it again, but I'm too lazy right now.

Is this still a tracking threat?

I probably don't need to answer this one anymore. But yes. My veredict is:

Disk Cache is Evil :imp:

EDIT: memory cache is cool. (see below)

... which sucks, because it's so F'ing useful... Ecological, even.

Correct me, please, if I am wrong here:
Disabling cache (disk&memory) deals with ETag and *Modified headers effectively for anti-tracking.

Currently I have disk caching disabled and memory enabled (with Header Editor to deal with ETags)... but I really do not notice any significant downside when both caching are disabled.

Disabling disk & memory cache means all these caching tracking mechanisms/techniques are obsolete.

I disabled memory cache (disk cache has always been off) a while back for a wee test, and didn't really notice much difference either. However, depending on the site and my workflow, it can make a difference - I have one site I need to page thru a lot, and all the common elements add up to quite a bit. With TC this wouldn't happen until the last tab of that domain closed

TC is looking to be the ultimate tool IMO

@claustromaniac HTTP Responses are the replies from the server. Status responses as 200 OK or 304 Not Modified come before the actual rsponse containing data.

Squid allows to exclude specific sites and file formats from caching.

@crssi and @Thorin-Oakenpants,

The reason you don't notice much difference between enabled and disabled memory cache is that Firefox actually doesn't use it very often. It mostly uses it when you use the Back and Forward navigation actions, the rest of the time it pretty much doesn't use it at all. Also, as I said in my first post, extensions can flush the memory cache forcibly to refresh their data, which means that it is entirely possible for you to have memory cache enabled and still never really make use of it, depending on the circumstances. I found this out very recently, to be honest. It turns out memory cache is generally not as well understood as disk cache (and kind of overrated, too).

TC is looking to be the ultimate tool IMO

I concur.

@Atavic OK, I didn't know what you were referring to. I know those as HTTP status codes.

^^ Thank you. From what you are saying I really do not see the reason why ghacks user.js would not have memory cache also disabled by default?

Well, personally, I still find memory cache useful because those few times that Firefox does use it, it doesn't initiate network requests at all - it just grabs everything from memory and some servers may not even find out that you went back to a previous page.

In contrast, if you have memory cache disabled, Firefox always has to talk to the servers to decide what to do, without exceptions.

Disk cache is not evil, it's being abused to do evil things. It's easier to track _without_ a disk (or memory) cache and in real time. One of the things I do from a proxy upstream is strip most headers from cached content and rewrite the others for reasons like this.

From what you are saying I really do not see the reason why ghacks user.js would not have memory cache also disabled by default?

When we get around to it - the idea is to have a set of 20 or 30 prefs for "relaxed" (so people can watch their cat videos and use google docs/maps, outlook.com etc) and I can think of 10 or 20 for a "hardened" js. The current master is a just a template

IMO this sort of tracking would be rare [citation needed] - there is so much other low hanging fruit for corporate surveillance. Depends on your threat model. If someone (big) was doing this, it probably would have come to light - but then again shrug, it took 2 years for Verizon to be outed for injecting that shit into mobile HTTP traffic, and even though people have known for 7 years about SSL Session Ticket IDs, no-one seems to care - and it takes a paper just released by a university to get it back into the news a little.

In contrast, if you have memory cache disabled, Firefox always has to talk to the servers to decide what to do, without exceptions.

True

Disk cache is not evil, it's being abused to do evil things

We're just playing on Google's do no evil mantra. We're a strange bunch here :) Especially that :cat2:

So all I wonder now is ... is it worth filing a bugzilla for Last-Modified to only accept valid date/time input

@fmarier

Your bugzilla skills are better than mine. What component would this be under, or is there an existing ticket (I did a few searches but couldn;t find anything)

when a server sends you non-standard Last-Modified headers, Firefox should not send the non-standard If-Modified-Since in subsequent requests. I'm sorry to say I was wrong, though. I just tested it (on FF 62) and it doesn't give a shit what the Last-Modified header says as long as it says something :man_facepalming:. This means the documentation at MDN is merely describing the standard, and this pair of headers is just as easy to use for tracking as ETag and If-None-Match.

Bug: sanitize incoming `Last-Modifed` and/or outgoing `If-Modified-Since` to valid date/time as per [1] [2]
[1] https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Last-Modified
[2] https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/If-Modified-Since

^^ @tomrittervg as well :)

IMHO it's not worth it. Even if Last-Modified were validated, it would still be a tracking vector. I don't see any optimal solution to that. I can only think of taxing heuristics at best.

I hope I'm wrong, but if I'm not, a confirmation would still be nice.

Oh I agree, but number of seconds is finite - so it's definitely closing a loophole. I just want standards to be adhered to, and it would be interesting to see what happens - eg major breakage reports

This comment was hidden to preserve mouse wheels.

number of seconds is finite - so it's definitely closing a loophole

Yeah, I thought that, too. But, if the standard were enforced, timestamps set sometime in the future would still do the trick. And then, what? Up to this point, something simple like a regular expression would be enough to implement validation, but then that wouldn't be enough.

They could then try to validate the date itself to make sure it's not set sometime in the future, but what would define the present? The time in the client's system or the time in the server's system? They should also take into account their time zones: the Last-Modified value could be set a few hours in the future for the purpose of tracking you, or the server could simply be in Japan.

All of that could probably be solved with a lot of logic, but the performance hit would get nasty, I reckon.

Let's say they do that. Then, to face this practical limitation, our hypothetical tracking fellas could give the whole thing more complexity and simply start reusing timestamps whenever they can. I mean, if I can link a series of timestamps for a single resource to a single client, then every time I give that client a new timestamp for that resource, I no longer need to remember the previous timestamps. I can give them to clients that haven't yet cached my resource.

I can go on and on, but that should make my point. It'd be a helluva lot of added complexity for an unoptimal solution.

I too want standards to be adhered to, but I'd say this is rather a design flaw/oversight. As long as it is the servers who have the power to decide what to do with those conditional requests, the clients will be at risk of being tracked. And I don't think that can be changed without breaking the current model. Those headers should instead be deprecated and replaced with something new and different, in that case. (That would be awesome, though)

I'll add @englehardt for comments about whether cache-based tracking is 'rare' or 'common' these days; but it's been in the wild for a decade: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1898390

I agree that the number of seconds is large enough that limiting Last-Modified to valid timestamps isn't an effective mitigation for tracking.

As far as Cache Usage, we have telemetry on that I think, which would tell you how much users in the wild take advantage of cache. Check https://telemetry.mozilla.org and https://searchfox.org/mozilla-central/source/toolkit/components/telemetry/Histograms.json You may need to search the code for a probe to fully understand what it's measuring.

Well, it was worth asking :) TBH there are already (more comprehensive and complete) solutions, disable all cache or use Temporary Containers in a hardened config.

OT: @tomrittervg Did my email to you and Arthur re SSL Session Resumption get lost?

Quoting myself...

Those headers should instead be deprecated and replaced with something new and different, in that case. (That would be awesome, though)

It seems I found that _something_, but I don't think it is new:

Cache-Control: immutable
Expires: <date>

https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cache-Control

immutable

Indicates that the response body will not change over time. The resource, if unexpired, is unchanged on the server and therefore the client should not send a conditional revalidation for it (e.g. If-None-Match or If-Modified-Since) to check for updates, even when the user explicitly refreshes the page. Clients that aren't aware of this extension must ignore them as per the HTTP specification. In Firefox, immutable is only honored on https:// transactions.

https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Expires

Was this page helpful?
0 / 5 - 0 ratings