Shields: Caching of API responses

Created on 5 Mar 2018  Â·  12Comments  Â·  Source: badges/shields

In the past two weeks we've been getting many requests for https://liberapay.com/Changaco/public.json, which I assume are coming from shields following the deployment of #1251.

The requests often arrive in groups of 4, which is the number of Liberapay badges on the homepage. Here's a log extract:

[05/Mar/2018:09:03:29 +0000] 200 71187us liberapay.com "GET /Changaco/public.json HTTP/1.1" 999 "-"
[05/Mar/2018:09:03:29 +0000] 200 72585us liberapay.com "GET /Changaco/public.json HTTP/1.1" 999 "-"
[05/Mar/2018:09:03:35 +0000] 200 17306us liberapay.com "GET /Changaco/public.json HTTP/1.1" 999 "-"
[05/Mar/2018:09:03:36 +0000] 200 18380us liberapay.com "GET /Changaco/public.json HTTP/1.1" 999 "-"
...
[05/Mar/2018:09:03:41 +0000] 200 17871us liberapay.com "GET /Changaco/public.json HTTP/1.1" 999 "-"
[05/Mar/2018:09:03:43 +0000] 200 18523us liberapay.com "GET /Changaco/public.json HTTP/1.1" 999 "-"
[05/Mar/2018:09:03:43 +0000] 200 15926us liberapay.com "GET /Changaco/public.json HTTP/1.1" 999 "-"
[05/Mar/2018:09:03:43 +0000] 200 18260us liberapay.com "GET /Changaco/public.json HTTP/1.1" 999 "-"

This shows that there is very little caching of these API responses, even though they contain a Cache-Control header with a value of public, max-age=3600. It would be nice if that could be improved.

Similar closed issue: #266.

core question

Most helpful comment

@paulmelnikow - if you're an OSS project which it seems? CloudFlare will give you a pro sub for free

All 12 comments

Hi, thanks for raising this. We could definitely do a more thorough job of caching requests made to services and it's something I'm happy to talk about and try to improve.

I think the behavior you're seeing is expected based on the implementation, though I don't think it implies there is little caching of these requests. As @espadrine mentioned in #266:

That's normal. Until we have the response cached, since we cannot send the corresponding information, we ask.

Meaning, if we get four simultaneous responses for an unknown project, they will all fire. It's only once the first response has come back that we have a cached result.

I admit I haven't fully grokked the vendor caching code (in lib/request-handler.js). I _think_ it might fire a separate request for different badges, meaning liberapay/receives/Changaco will be used for subsequent requests to liberapay/receives/Changaco, but not liberapay/gives/Changaco. In fact I'm pretty sure this is how it works because we don't cache the responses from the API – which would be relatively large – rather, the badge's computed text.

Another point to make is that we have three servers, and their caches are independent.

I'd love to improve our caching, possibly by using an external cache in redis or memcached, and possibly syncing those between servers. Though evolving that is substantially limited by our server resources. The cache size is tweaked to prevent OOM crashes. Our hosting budget is $18/month, which is incredibly small for the 500Mreq we handle in that time. That's not to say we can't or won't improve this, just to put it in perspective. (We're asking for one-time $10 donations from people who use and love Shields. Separately from this… please donate and spread the word!)

Finally there are two simple things we could do to address the symptom you're observing. We could prerender the example badges, and/or not display the entire list of badges until people start to drill down into a category.

I've looked at the code and I think the first thing we should change is the underlying cache mechanism: it's currently a simple LRU instead of a time-aware algorithm. When the cache is full the "oldest" entry is dropped, but that entry may be valid for another hour while the "newest" may already be obsolete. This caching policy explains why raising the max-age of our API responses from 10 minutes to 1 hour didn't really help.

There is some heuristic logic that tries to reduce the frequency of refreshing data that is in the cache, max-age or no. So, it's possible this would cause a higher cache hit frequency overall, though it's also possible recency works well enough. Seems worth an experiment! What algorithm would you suggest instead?

Another issue is that I don't have access to the server logs, so @espadrine is in a better position to run experiments like this.

Also related: #1285.

I have no experience with this kind of algorithm, but it seems to me that we need to modify the data structures like this:

 function CacheSlot(key, value) {
   this.key = key;
   this.value = value;
-  this.older = null;  // Newest slot that is older than this slot.
-  this.newer = null;  // Oldest slot that is newer than this slot.
+  this.less_recently_used = null;
+  this.more_recently_used = null;
+  this.earlier_expiring = null;
+  this.later_expiring = null;
 }

 function Cache(capacity, type) {
   if (!(this instanceof Cache)) { return new Cache(capacity, type); }
   type = type || 'unit';
   this.capacity = capacity;
   this.type = typeEnum[type];
   this.cache = new Map();  // Maps cache keys to CacheSlots.
-  this.newest = null;  // Newest slot in the cache.
-  this.oldest = null;
+  this.most_recently_used = null;
+  this.least_recently_used = null;
+  this.earliest_expiring = null;
+  this.last_expiring = null;
 }

and when cleaning the cache drop the earliest_expiring slot if it has expired, otherwise fall back to dropping the least_recently_used entry.

(Edit: on second thought this may not be the right data structure, because inserting a new cache slot would no longer run in constant time.)

@Changaco, another option would be to set up a static endpoint and point the example badges at those. Then page loads at shields.io wouldn't be causing database queries at Liberapay.

Our backend is using three of the aforementioned LRU caches in total:

  • a request cache in the request-handler module.
  • an image cache in the svg-to-img module.
  • a badge key width cache in the make-badge module.

Their size is quite small, they can contain at most 1000 elements. With the high volume of different badge requests we're receiving, I'm not expecting cached elements to survive for very long before being forcibly removed.

According to the back-of-the-envelope calculations indicated in the code, their respective memory footprint is expected to be around 5MB, 1.5MB and probably a negligible amount for the last one. That does't seem like much at all especially for the request-handler one, even our limited servers could probably handle increasing those cache sizes significantly. This seems like a quick win to avoid some redundant computations and somewhat lower the number of API requests we're making. Any opinions?

While responding to #2424 it occurred to me that we could run our own downstream cache in front of individual services, particularly for information which doesn't change often (like Liberapay) and/or is served from very slow APIs (like libraries.io). During our outage earlier this year someone suggested using the Google cache downstream of us: https://github.com/badges/shields/issues/1568#issuecomment-407860059. I imagine we couldn't feed JSON endpoints to the Google cache and expect them to work, though we could do something similar by putting Cloudflare in front of these requests.

Cloudflare Free will not let you rewrite Host headers to reverse-proxy someone else's site at a different name: https://community.cloudflare.com/t/can-we-use-cloudflare-as-a-reverse-proxy-to-a-wordpress-site/31051/2

So we'd have to set up our own reverse proxy – though we could place Cloudflare Free in front of _that_.

@paulmelnikow - if you're an OSS project which it seems? CloudFlare will give you a pro sub for free

Seems that the public service could benefit from some additional caching. I frequently encounter this:

image

Could you open a new issue? Sounds like we should amp up our monitoring to start. Caching API responses might help but probably would not fix an issue where many badges are broken at once.

Could you open a new issue? Sounds like we should amp up our monitoring to start. Caching API responses might help but probably would not fix an issue where many badges are broken at once.

Yep! New issue below. Thanks, Paul.

https://github.com/badges/shields/issues/3874

Was this page helpful?
0 / 5 - 0 ratings

Related issues

kerolloz picture kerolloz  Â·  3Comments

Fazendaaa picture Fazendaaa  Â·  3Comments

paulmelnikow picture paulmelnikow  Â·  3Comments

calebcartwright picture calebcartwright  Â·  3Comments

najeeb-ur-rehman picture najeeb-ur-rehman  Â·  3Comments