Rxjs: cache operator

Created on 30 Nov 2015 · 17Comments · Source: ReactiveX/rxjs

Currently, if you want to cache some results from an Observable, you're forced to use publishReplay or a ReplaySubject or something to that effect. There are a few problems with this approach:

It's not really apparent to users that this is what they need to do.
The default behavior of publishReplay isn't what most people would want for common scenarios. For example: If there is a error from the source, you'll likely want to retry it, but if there's not, you probably don't want to repeat it. (This community infamously had a passionate debate on that matter, LOL).
It probably doesn't have all of the control necessary to define the various caching needs for each application.

There is a need, overall for a solid caching operator, and perhaps a set of refined versions of it. That makes it simple and easy to do all sorts of caching. The problem is that there is _all sorts of caching_.

@jhusain brought up that he had some ideas around this, and I've talked to @abersnaze and @stealthcode about work they're doing around this same issue in RxJava. @mattpodwysocki and @benjchristensen may also have some thoughts.

Cache Evictions are complex

@abersnaze brought up the Google CacheLoader, which I think is a good resource to understand the types of evictions that can occur.

Short list of common cache evictions:

size-based evictions (only cache the last 10 values)
evict each value Xms from the time it arrived
evict all/some/one value(s) at X absolute time
evict values based off of some analysis of the value as it is being cached
evict values when no one is subscribing any more.
evict values on error
evict values on completion.
on and on and on...

(Caching is hard)

Events that could affect desired cache behavior

Possible cache modifying events in RxJS:

Subscription to the source
New value received from source
Error from source
Completion from source
Source unsubscribed
Consumer subscribed
Consumer unsubscribed

Goals

I think we should define a primary cache operator (or a set of operators existing or non-existant, that can be composed to get all desired caching behaviors). From there, I think we should identify the most desired caching behaviors and create operators specific for those needs (with the primary cache operator driving them).

high discussion

Source

benlesh

👍2

All 17 comments

@blesh I experimented a long time ago with a memoize operator (since Observables are async functions, memoization should be possible) but I think it's mostly already covered by publishReplay, or so implementation-specific that it's impossible to solve generically without becoming Falcor or GraphQL.

trxcllnt on 30 Nov 2015

I think it's mostly already covered by publishReplay

One of a few problems here is it's not retry-able.

I don't think the goal is to solve it generically, rather to enable people to compose the sort of caching they need from their Observables.

I think the basic idea might be something like:

cache<R> (nextHandler: (value: T, store: Map<R>) => Subscription<any>|function|void,
  errorHandler: (err: any, store: Map<R>) =>  Subscription<any>|function|void,
  completeHandler: (store: Map<R>) =>  Subscription<any>|function|void,
  startHandler: (subscriber: Subscriber<T>, store: Map<R>) =>  Subscription<any>|function|void) : CacheObservable<T, R>

benlesh on 30 Nov 2015

... then again, caching would be multicast, so really perhaps it's a better idea to have a CacheSubject, with which you're able to more precisely define behaviors than with ReplaySubject...

benlesh on 30 Nov 2015

I guess the real test would be whether the design allowed for all/most of the eviction types.

benlesh on 30 Nov 2015

One of a few problems here is it's not retry-able.

.multicast(() => new ReplaySubject(cacheSize)).refCount() is retry-able

staltz on 1 Dec 2015

... But it's repeat semantics are wrong. :/

benlesh on 1 Dec 2015

Wrong how? (Just trying to understand)

By the way, this is weird:

Rx.Observable.of(10,20,30)
  .concat(Rx.Observable.throw('poop'))
  .multicast(() => new Rx.ReplaySubject(2))
  .refCount()
  .retry(5)
  .subscribe(x => console.log(x), e => console.error(e), () => console.info('|'));

// 10
// 20
// 30
// 10
// 20
// 30

Rx.Observable.of(10,20,30)
  .multicast(() => new Rx.ReplaySubject(2))
  .refCount()
  .repeat(5)
  .subscribe(x => console.log(x), e => console.error(e), () => console.info('|'));

// 10
// 20
// 30
// 10
// 20
// 30

In both cases it repeats just 2 times, and no termination.

staltz on 1 Dec 2015

... oh, I should have been more specific.

In the second example you have, it's not really replaying cached values when you repeat it. It's just starting all over again.

The desired behavior for a cache (in most cases) would be:

run the source Observable once and cache the values
if completed, all subsequent subscribers would get the cached values. (this means that repeat would repeat cached values, like publishReplay would)
if errored, retry would clear the cache, and go to 1.

benlesh on 1 Dec 2015

... this might be accomplishable with the connection semantics that @trxcllnt described a long-long time ago.

benlesh on 1 Dec 2015

You mean control? Should that be in beta, or later?

staltz on 2 Dec 2015

You mean control? Should that be in beta, or later?

It doesn't seem to be a major requirement, and if we add it, it won't be a breaking change. I'd love to see a proposal on it.

benlesh on 2 Dec 2015

Is it possible to use some kind of "CachingStrategy"? And implementating operator/method would decide when cache is valid... For HTTP it would be simple strategy: Get/Head/Options with 200 response are cacheable. Maybe with some invalidation delay.
For other kinds - other strategies. I think it's too broad problem to solve by universal code.

e-oz on 9 Dec 2015

As it stands how can the desired behaviour be achieved?

The desired behavior for a cache (in most cases) would be:

run the source Observable once and cache the values

if completed, all subsequent subscribers would get the cached values

if errored, retry would clear the cache, and go to 1.

Thanks

christianacca on 19 Feb 2016

I'm hoping this discussion kicks off again...

It's important because the _desired behaviour_ is super common at least in code I've written in angular (see example below).

var foosFetched;
function fetchFoos() {
    if (foosFetched) return foosFetched;

    foosFetched = db('foos');
    return foosFetched.catch(err => {
        foosFetched = null;
        // "rethrow"
        return $q.reject(err);
    });
}

I even generalized the above into a utility method named memoizeAsync that produces a function that performs the above.

PS. I'm not sure I can contribute to the conversation myself as I'm very new to rxjs :-(

christianacca on 20 Feb 2016

This was closed by the addition of the basic cache operator with #1320

benlesh on 21 Mar 2016

So the new basis cache operator - does it fulfill the three desired behaviour that you specified in this thread?

Thanks