OkHttp's Cache seems to replace an existing response entry if another comes from the same target but with different values for request's Vary fields. For example consider this test case:
@Test void vary() throws Exception {
var server = new MockWebServer();
server.setDispatcher(new Dispatcher() {
@Override public MockResponse dispatch(RecordedRequest recordedRequest) {
var baseResponse = new MockResponse()
.addHeader("Cache-Control", "max-age=69420")
.addHeader("Vary", "Accept-Encoding");
var body = "ZIP ME!";
var encoding = recordedRequest.getHeader("Accept-Encoding");
if ("gzip".equalsIgnoreCase(encoding)) {
return baseResponse.addHeader("Content-Encoding", "gzip")
.setBody(ZipUtils.gzip(body));
} else if ("deflate".equalsIgnoreCase(encoding)) {
return baseResponse.addHeader("Content-Encoding", "deflate")
.setBody(ZipUtils.deflate(body));
} else {
return baseResponse.setBody(body);
}
}
});
server.start(8080);
var client = new OkHttpClient.Builder()
.cache(new Cache(Files.createTempDirectory("okhttp-cache").toFile(), 10 * 1024))
.build();
var gzipRequest = new Request.Builder()
.url(server.url("/"))
.header("Accept-Encoding", "gzip")
.build();
try (var response = client.newCall(gzipRequest).execute()) {
assertEquals("ZIP ME!", ZipUtils.gunzip(response.body().bytes()));
assertEquals(1, client.cache().networkCount());
assertEquals(0, client.cache().hitCount());
}
// gzip response gets replaced as both requests have same key using by the cache (URL)
var deflateRequest = new Request.Builder()
.url(server.url("/"))
.header("Accept-Encoding", "deflate")
.build();
try (var response = client.newCall(deflateRequest).execute()) {
assertEquals("ZIP ME!", ZipUtils.inflate(response.body().bytes()));
assertEquals(2, client.cache().networkCount());
assertEquals(0, client.cache().hitCount());
}
try (var response = client.newCall(gzipRequest).execute()) {
assertEquals("ZIP ME!", ZipUtils.gunzip(response.body().bytes()));
// Expected: use previously saved response
assertEquals(2, client.cache().networkCount()); // fails: expected 2 but was 3
assertEquals(1, client.cache().hitCount()); // fails: expected 1 but was 0
}
}
I don't know if this is intentional or not but if I understand correctly, a cache is allowed to have multiple response entries per cache key, each with a secondary key represented by request's selecting headers values (last paragraph of RFC 7234 sec 2). If this is indeed intentional, it'd be much appreciated to know reasons why.
Thanks in advance & great library!
Confirmed. You're the first person to notice that independent variations are missing, so it might have been the right call! Or at least at the time.
OkHttp鈥檚 cache was originally designed to be concurrent enough for single-user applications (like a browser) but it's now commonly used in server and cloud environments where this design is too limiting.
What's your use case? Presumably we need something other than an MD5 of the URL as a cache key.
Hi jesse, thanks for the response.
Unfortunately, I don't have a specific use case for this, I've just been messing with the cache lately :p. However, I can imagine one for a single-user being having multiple representations for a resource (say multiple languages for an article). With the current design, switching to one representation (different Accept-...) causes the cache to invalidate the other.
As for the key, I'm not sure how you'd integrate Vary fields since you probably need to read entry's metadata with URL key first. Since DiskLruCache acts as a ListMultimap in some sense, I'm guessing you might use an index for each variant, with the metadata mapping each request's Vary field values to it's corresponding index (variant).
I suspect anyone who hit this probably found it quicker to append a fake "?variant=XML" on the URL than explain the bug and request a fix. It seems like a useful feature if done right, there might even be broader issues here like cache pollution we aren't handling?
I don't believe there's cache pollution; it'll just evict something it doesn't need to evict when you switch variants.
Because fixing this has a real runtime cost (more work for cache lookup) and unclear benefit my preference is to icebox this.
I don't believe there's cache pollution; it'll just evict something it doesn't need to evict when you switch variants.
Is that because if it's per user or per variant it just won't be very cacheable? Public caches would see problems well before we do?
Its because if we have a variant mismatch we treat it like a stale cache entry. The old variant is not used and the new one will clobber it.
Going to close this out rather than icebox it. Hasn't been a priority and @mizosoft doesn't have a firm need for it.