Got: Cache should include the promise of matching active requests

Created on 19 Feb 2020  路  18Comments  路  Source: sindresorhus/got

What problem are you trying to solve?

Performance improvement multiple simultaneous requests with caching should await for the first to resolve and use the cached results of the first one.

Describe the feature

Contrived example but let's assume got is called simultaneously twice with the same link and a third time after they have resolved.

const A = got(link, {..., cache: map});
const B = got(link, {..., cache: map});

const [a, b] = await Promise.all([A, B]);
console.log('isFromCacheA', a.isFromCache); // false
console.log('isFromCacheB', b.isFromCache); // false

// Call again and we get `true`
const c = await got(link, {..., cache: map});
console.log('isFromCacheC', c.isFromCache); // true

As the outcome appears A and B are both not from the cache. Whilst C uses the cache. What i'd like to see ultimately is one request whose promise is stored in the cache map. And then awaited to serve the cached response.

To get around this one might have to maintain their own map to await on before making subsequent requests...

const map = new Map();
const promiseMap = new Map();
function gotCache(link) {
     if (promiseMap.get(link)) {
          await promiseMap.get(link);
     }
     const prom = got(link, {cache: map});
     promiseMap.set(link, prom);
     return prom;
}

All 18 comments

I have some experience in this as I have worked on this library, that implements a similar feature.
I built the dedupe function in async-deco to deal with that https://github.com/sithmel/async-deco

For in memory cache is straightforward. For distributed cache (redis), is definitely more tricky. The problem is that you will need a distributed locking (in async-deco I used redlock), and this could add an overhead to a request, that I haven't had the chance to measure yet.

I'll close this issue since it will be possible to do this as a part of the should accept different caching logic point. I have to say this is a good idea, it will save us some time (10ms+). I am even doing this in my http2-wrapper package.

Just adding some extra thoughts.
In the case of multiple instances with distribute locking, you will completely lose any advantage with the latency of acquiring/releasing the lock. So I think it is a no no.

The reason why I mentioned the use of dedupe (async-deco) is because in reliable-get (the http get library I was mentioning above) we knew in advance whether we cache a resource. In that case you can dedupe the entire request, and this is a very effective optimisation.

In an rfc compliant caching such as GOT, you know if you should cache, only after getting a response. So basically you can't apply that optimisation.

If you use an in-memory cache you have no latency, so the optimisation is not effective.

If you use an external cache (redis for example) you have probably multiple node process, so the optimisation is only effective in the lucky event the request for the same resource lands on the same process.

So in my opinion I think it is not really worth it

Got will follow the RFC but people will be able to change the behavior if they would like to.

latency of acquiring/releasing the lock

This is not C++, Node.js is single-threaded. Node.js is blazing fast, its I/O is slow.

Thanks @sithmel and @szmarczak

Reading up on the topic and looking at your examples. I have learnt a new term _memoize_. So it seems pretty simple enough to decorate _got_ function for my needs where i dont have two many different requests. Of course an improved approach would likely use a keyv storage adaptor. But i really didn't look into those too much. Since the memoize decorator actually worked out best for me as i could use it higher up.

Thanks again

Reading up on the topic and looking at your examples. I have learnt a new term _memoize_. So it seems pretty simple enough to decorate _got_ function for my needs where i dont have two many different requests. Of course an improved approach would likely use a keyv storage adaptor. But i really didn't look into those too much. Since the memoize decorator actually worked out best for me as i could use it higher up.

If you could share a working example of that memoization, that would be great :)

If you could share a working example of that memoization, that would be great :)

Here you go @rfgamaral https://github.com/5app/memoize

Thank you :)

Here you go @rfgamaral https://github.com/5app/memoize

@MrSwitch Just noticed that you could have used p-memoize or mem for this exact purpose. Any reason you have chosen not to? Just trying to understand my options here :)

@rfgamaral i didn't know about those beforehand. They all do very much the same thing. I needed to set cache expiry on rejected and resolved Promise's differently in my app so i have defined an option useCache(cachedItem){} to customise that on implementation - sorry It's not really well documented.

Those others you reference look far more established than mine

@MrSwitch Thank you for that and sorry to bombard you so many questions but "just one more" if you don't mind...

Are you using both memoization and Got's cache property? What exactly is your request/cache flow for both?

Or maybe you're just using memoization over Got's cache? It seems that Got will move in that direction with #875, right? If that's the case, what are the advantages of doing so?

@rfgamaral that's the reason why i started this issue. Yes, i'm using _both_, but it's not ideal because memoize'ing effectively duplicates the cached response.

I would have liked to understand the cache adaptors in Got to see if there was a way to use them: However, based on my limited knowledge on the subject, the cache adaptor would be thwarted by:...

  • caches only occur after the response, not the pending request.
  • i'd have to mutate the cached response object to override the expiry headers... which sounds hacky.

The benefit of memoizing is it's a good all round tool that can be applied higher up the call stack.

I hope that helps

875 does sounds like it would make it possible.

Personally, I'm trying to "replicate" RESTDataSource using Got with two features in particular:

  • Request deduplication to prevent double-invoking HTTP calls in rapid succession (memoization).
  • Resource caching of recent HTTP responses to avoid hitting internal APIs just to get the same data.

If I understand everything correctly, I believe I could rely on Got's built-in cache mechanism for resource caching and use mem for request deduplication (with a very short maxAge like 250ms or maybe a bit less).

If I implement both mechanisms correctly I can potentially get something like this:

  • Make a request with a memoized got;
  • If multiple requests were done in quick succession, return a memoized promise (if not expired);
  • If no memoized promise was is available, check Got internal cache for a response;
  • If a cached response is available, return that;
  • If no cached response is available, make a request to the origin endpoint and return the response (while adding the promise to memoized cache and the response to Got's internal cache).

What do you think?

That's pretty much exactly what i'm doing for the happy path.

I added some extra stuff to handle exceptions https://github.com/5app/memoize/blob/master/README.md#options-memoizehandler-options

What exactly do you mean with "handle exceptions"? What kind of exceptions?

What exactly do you mean with "handle exceptions"? What kind of exceptions?

HTTP status codes: 4xx, 5xx responses, we wanted to cache those, but for few seconds rather than minutes.

Alright, thanks for the clarification :)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

darksabrefr picture darksabrefr  路  3Comments

alvis picture alvis  路  3Comments

carvallegro picture carvallegro  路  4Comments

khizarsonu picture khizarsonu  路  3Comments

erfanium picture erfanium  路  3Comments