Okhttp: OkHttp's cache doesn't save multiple response variants for the same target

Created on 6 Aug 2020 · 7Comments · Source: square/okhttp

OkHttp's Cache seems to replace an existing response entry if another comes from the same target but with different values for request's Vary fields. For example consider this test case:

@Test void vary() throws Exception {
  var server = new MockWebServer();
  server.setDispatcher(new Dispatcher() {
    @Override public MockResponse dispatch(RecordedRequest recordedRequest) {
      var baseResponse = new MockResponse()
          .addHeader("Cache-Control", "max-age=69420")
          .addHeader("Vary", "Accept-Encoding");
      var body = "ZIP ME!";

      var encoding = recordedRequest.getHeader("Accept-Encoding");
      if ("gzip".equalsIgnoreCase(encoding)) {
        return baseResponse.addHeader("Content-Encoding", "gzip")
            .setBody(ZipUtils.gzip(body));
      } else if ("deflate".equalsIgnoreCase(encoding)) {
        return baseResponse.addHeader("Content-Encoding", "deflate")
            .setBody(ZipUtils.deflate(body));
      } else {
        return baseResponse.setBody(body);
      }
    }
  });
  server.start(8080);

  var client = new OkHttpClient.Builder()
      .cache(new Cache(Files.createTempDirectory("okhttp-cache").toFile(), 10 * 1024))
      .build();

  var gzipRequest = new Request.Builder()
      .url(server.url("/"))
      .header("Accept-Encoding", "gzip")
      .build();
  try (var response = client.newCall(gzipRequest).execute()) {
    assertEquals("ZIP ME!", ZipUtils.gunzip(response.body().bytes()));
    assertEquals(1, client.cache().networkCount());
    assertEquals(0, client.cache().hitCount());
  }

  // gzip response gets replaced as both requests have same key using by the cache (URL)
  var deflateRequest = new Request.Builder()
      .url(server.url("/"))
      .header("Accept-Encoding", "deflate")
      .build();
  try (var response = client.newCall(deflateRequest).execute()) {
    assertEquals("ZIP ME!", ZipUtils.inflate(response.body().bytes()));
    assertEquals(2, client.cache().networkCount());
    assertEquals(0, client.cache().hitCount());
  }

  try (var response = client.newCall(gzipRequest).execute()) {
    assertEquals("ZIP ME!", ZipUtils.gunzip(response.body().bytes()));
    // Expected: use previously saved response
    assertEquals(2, client.cache().networkCount()); // fails: expected 2 but was 3
    assertEquals(1, client.cache().hitCount());     // fails: expected 1 but was 0
  }
}

I don't know if this is intentional or not but if I understand correctly, a cache is allowed to have multiple response entries per cache key, each with a secondary key represented by request's selecting headers values (last paragraph of RFC 7234 sec 2). If this is indeed intentional, it'd be much appreciated to know reasons why.

Thanks in advance & great library!

bug enhancement

Source

mizosoft

All 7 comments

Confirmed. You're the first person to notice that independent variations are missing, so it might have been the right call! Or at least at the time.

OkHttp’s cache was originally designed to be concurrent enough for single-user applications (like a browser) but it's now commonly used in server and cloud environments where this design is too limiting.

What's your use case? Presumably we need something other than an MD5 of the URL as a cache key.

swankjesse on 6 Aug 2020

Hi jesse, thanks for the response.

Unfortunately, I don't have a specific use case for this, I've just been messing with the cache lately :p. However, I can imagine one for a single-user being having multiple representations for a resource (say multiple languages for an article). With the current design, switching to one representation (different Accept-...) causes the cache to invalidate the other.

As for the key, I'm not sure how you'd integrate Vary fields since you probably need to read entry's metadata with URL key first. Since DiskLruCache acts as a ListMultimap in some sense, I'm guessing you might use an index for each variant, with the metadata mapping each request's Vary field values to it's corresponding index (variant).

mizosoft on 6 Aug 2020

I suspect anyone who hit this probably found it quicker to append a fake "?variant=XML" on the URL than explain the bug and request a fix. It seems like a useful feature if done right, there might even be broader issues here like cache pollution we aren't handling?

yschimke on 6 Aug 2020

👍1

I don't believe there's cache pollution; it'll just evict something it doesn't need to evict when you switch variants.

Because fixing this has a real runtime cost (more work for cache lookup) and unclear benefit my preference is to icebox this.

swankjesse on 6 Aug 2020

I don't believe there's cache pollution; it'll just evict something it doesn't need to evict when you switch variants.

Is that because if it's per user or per variant it just won't be very cacheable? Public caches would see problems well before we do?

yschimke on 6 Aug 2020

Its because if we have a variant mismatch we treat it like a stale cache entry. The old variant is not used and the new one will clobber it.

swankjesse on 6 Aug 2020

👍1

Going to close this out rather than icebox it. Hasn't been a priority and @mizosoft doesn't have a firm need for it.

yschimke on 8 Aug 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Sometimes I am getting the headers in lowercase

SandroMachado · 3Comments

Public API for toHumanReadableAscii(String) ?

mleibner · 3Comments

Fatal Exception: java.util.concurrent.RejectedExecutionException

TheLester · 3Comments

java.lang.IllegalArgumentException: sockaddrToInetAddress unsupported ss_family: 0

lyf571321556 · 3Comments

Test TLSv1.3 with conscrypt

yschimke · 3Comments