Almanac.httparchive.org: Finalize assignments: Chapter 16. Caching

Created on 21 May 2019 · 11Comments · Source: HTTPArchive/almanac.httparchive.org

Due date: To help us stay on schedule, please complete the action items in this issue by June 3.

To do:

[x] Assign subject matter expert (author)
[x] Finalize peer reviewers
[x] Finalize metrics

Current list of metrics:

TTL by resource
Resources served without cache
Cache strategy?
Cache TTL vs Content Age
Availability of Last-Modified vs. ETag validators
Validity of Dates in Last-Modified and Date headers
Set-Cookie on cacheable responses?
Use of Cache-Control: max-age vs. Expires
Use of Vary (how many dimensions, what headers, etc.)
Use of other Cache-Control directives (e.g., public, private, immutable)
1st Party vs 3rd Party Caching
Public vs Private
Use of must-revalidate
Service Worker caching
AppCache

👉Optional AI (@paulcalvano): Peer reviewers are trusted experts who can support you when brainstorming metrics, interpreting results, and writing the report. Ideally this chapter will have multiple reviewers who can promote a diversity of perspectives. You currently have 1 peer reviewer.

👉 AI (@paulcalvano): Finalize which metrics you might like to include in an annual "state of third parties" report powered by HTTP Archive. Community contributors have initially sketched out a few ideas to get the ball rolling, but it's up to you, the subject matter experts, to know exactly which metrics we should be looking at. You can use the brainstorming doc to explore ideas.

The metrics should paint a holistic, data-driven picture of the third party landscape. The HTTP Archive does have its limitations and blind spots, so if there are metrics out of scope it's still good to identify them now during the brainstorming phase. We can make a note of them in the final report so readers understand why they're not discussed and the HTTP Archive team can make an effort to improve our telemetry for next year's Almanac.

Next steps: Over the next couple of months analysts will write the queries and generate the results, then hand everything off to you to write up your interpretation of the data.

Additional resources:

Source

rviscomi

Most helpful comment

Sorry I'm late, and know this is closed, but any thought in measuring whether ETags actually work?

They don't work in Apache for example if gzip or br is used (as I would hope they would be!) and you won't ever get 304 responses. Try it at www.apache.org for example - gzipped resources return 200 on refresh but images (which are not gzipped) correctly return a 304. So they should be turned off and Last-Modified should be used instead. Apache is pretty popular so imagine this affects a non-trival number of servers since ETags are enabled by default and most people turn on compression for performance reasons. Other servers may also have similar issues with them not actually working.

Also in the past ETags were often based on the inode which caused issues with load balanced servers, but not aware of anyone doing that anymore so not too worried about that. More worried about other implementation issues like Apache has. Though if can measure both together then why not.

It would require hitting at least one resource twice though (once with no cache, and then again with it cached) to see if 200 or 304 is returned so not sure how doable that is.

bazzadp on 4 Jun 2019

👍2

All 11 comments

Would be interesting to see metrics on:

Availability of Last-Modified vs. ETag validators
Use of Cache-Control: max-age vs. Expires
Use of Vary (how many dimensions, what headers, etc.)
Use of other Cache-Control directives (e.g., public, private, immutable)

mnot on 28 May 2019

👍2

Few more ideas

1st Party vs 3rd Party Caching
Public vs Private
Use of must-revalidate
Service Worker caching
AppCache usage (hopefully low)

paulcalvano on 3 Jun 2019

👍1

@paulcalvano @yoavweiss @colinbendell we're hoping to finalize the metrics for each chapter today. Could you edit https://github.com/HTTPArchive/almanac.httparchive.org/issues/18#issue-446806416 and update it with anything that's missing? I see a bunch of other metrics were discussed in the comments. When that's done please tick the last TODO checkbox item and close this issue. Thanks!

rviscomi on 4 Jun 2019

👍1

Sorry I'm late, and know this is closed, but any thought in measuring whether ETags actually work?

It would require hitting at least one resource twice though (once with no cache, and then again with it cached) to see if 200 or 304 is returned so not sure how doable that is.

bazzadp on 4 Jun 2019

👍2

Not too late to add a metric if @paulcalvano sees fit. Just update the first comment.

rviscomi on 5 Jun 2019

Investigating how well Etag validation is supported would be great. Just to note -- that apache bug is specific to mod_deflate; if you use Multiviews for negotiating encoding, it works fine (e.g., see www.mnot.net). That said, it'd be interesting to see how widespread that is.

Looking over https://cache-tests.fyi for inspiration, a few other things come to mind:

How common is it for sites to use non-lowercase cache-control parameters?
How common is it for sites to use invalid dates?
How common is it for sites to use Cache-Control: public (even though it usually isn't required)?
How common is it for sites to serve a Date and Age that don't make sense (see this paper)?
How many sites still use Pragma in responses (even though it doesn't mean anything)?
Do any sites put Set-Cookie on cacheable responses?

mnot on 5 Jun 2019

👍1

One additional thought: Might worth adding an experimental headers section and include in-the-wild uses of Variance or Key (if any)

colinbendell on 5 Jun 2019

I think ETag validation would be out of scope for this because we aren;'t making a repeat request. I agree it would definitely be interesting to explore whether servers are returning 304 status codes to requests with valid ETags.

@mnot - great idea to look at the cache tests. I'll add some of these to the list.

On the topic of valid dates - I ran into many invalid Date and Last-Modified headers in a recent analysis I did, so it would be interesting to explore what is going on there.

paulcalvano on 5 Jun 2019

@colinbendell - do you have an example of Variance or Key headers? I'm not familiar with those.

paulcalvano on 5 Jun 2019

Hoping we can resolve the open questions about metrics and close this issue ASAP.

rviscomi on 6 Jun 2019

Last call for metrics. @paulcalvano please update the final list and close this issue today. (sorry, couldn't think of a caching pun)

rviscomi on 7 Jun 2019

👍1

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Create templates for content pages

rviscomi · 5Comments

Translate content into Turkish

bazzadp · 4Comments

Typo in JS featured snippet

rviscomi · 3Comments

Wrong axis labels for Compression charts

rviscomi · 5Comments

Translate 2019 Japanese ebook description on Google Books

rviscomi · 5Comments