quarkus 🚀 - Annotation-driven application data caching

Hey @galderz

I don't understand in which situation @CacheStore and @CacheInvalidate would be used. Since the cache is keyed on the method signature, it will only be used for that specific method call. When would it make sense then to always refresh the cache be still executing the method. Same fore the invalidation.

Furthermore, the invalidate all seems useful but likely via some API rather than a method?

```
@Inject MethodCacheManager cacheManager;

@CacheLoad
public List getHotToys(Country country) {
return expensiveService.getHotToys(country);
}

public void prepareNewRelease() {
// clear cache prior to release
cacheManager.clearCacheFor(Service::getHotToys);
cacheManager.clearCacheFor(Service::getHotCountry);
}

emmanuelbernard on 22 Jul 2019

👍1

Hello @galderz,
I think that two annotations are enought :
@Cacheable: put a value in a cache, can have a boolean to override a value if needed
@CacheEvict: evict a value from the cache

To evict all the value in a region we can use the API as @emmanuelbernard suggested.

loicmathieu on 23 Jul 2019

Eviction using an annotation is kinda useful. I know we used that with the Spring caching layer.

BTW, another interesting project that is existing for a while and might be a good source of inspiration for remote caches: https://github.com/ragnor/simple-spring-memcached .

We used it on one project at $previousJob and it worked pretty well.

gsmet on 23 Jul 2019

@emmanuelbernard @CacheStore is meant to be used when you want to update the cached key/value pair e.g.

@CacheStore
public List<Toy> updateHotToys(Country country) {
    List<Toy> toys = getHotToys();
    ... // update the list
    return toys;
}

How would you update a key/value pair otherwise?

WRT invalidate all, I wasn't envisioning having any other APIs for interacting with the caching API initially. I'd rather have an annotation-only approach to start with, along with ways to find out how the cache is performing (e.g. cache misses...etc) via statistics or monitored values. We can always add APIs later if we need, but I feel the above proposal is enough to have something that feels homogeneous.

Moreover, the problem with mixed API and annotation approaches is that it can be difficult to decide what to do where. Anything that does a CRUD on the cache should be done via annotations IMO. Anything that means querying a statistic (e.g. cache misses), should be done via API.

@loicmathieu I'm not keen on annotation boolean parameters that make them behave in such a distinct way. What's wrong with having separate annotations for different things? This is why I created @CacheInvalidateAll as opposed to having a boolean flag in @CacheInvalidate. I'd like to avoid annotations becoming dumping grounds for N boolean parameters.

galderz on 23 Jul 2019

A few items:

We need to declare the semantics of invocation of the underlying method, i.e. whether this is synchronous, akin to a computeIfAbsent while holding a lock on the entry, thus avoiding repeated concurrent invocations of the method in case of a miss.
If the underlying method is non blocking (i.e. it returns CompletionStage) wire it as a non-blocking caching op. Otherwise we may want to wrap it in an executor ? @cescoffier what are the semantics in quarkus for this kind of thing?
I also like the idea of a caching deadline: if the method or the cache.get don't return within a set amount of time, provide either an alternate value or some fault handling. Again, @cescoffier, what are the semantics/rules on invocation timeouts ?

tristantarrant on 23 Jul 2019

@galderz oh so by default, all @Cache* something belong in the same name space? I thought it would be qualified by the method name but apparently not if updateHotToys and getHotToys can interact with one another.

emmanuelbernard on 23 Jul 2019

Whether we go for the single annotation like @loicmathieu is proposing (with a flag to force update) or or different annotations, I do feel that the proposed initial names are confusing me. Alternative proposal

@Cached
@RefreshCache
@EvictCache
@EvictAllCache

not in love either

emmanuelbernard on 23 Jul 2019

I'd rather have all annotations starting with Cache for the sake of clarity. Think that current developement model is really bloated with annotations and sometimes you have 5 differents, comming from 3 different APIs. Having all annotations starting with Cache will be easier to use.

@Cached, @CacheRefresh, @CacheEvict => annotations working at the cache entry level

For working at the region level (flush an entire region) or at the cache manager level (flush all regions, get info,get statistic) I'd rather use the API as it's linked to application management IMHU.

Regarding asyn cache => YES!

Regarding blocking cache: we need to be really careful, I know that it's convenient but it is not a one-size-fit-all functionality. I have in mind a couple of use cases where a blocking cache crash the application:

A cache region containing a single entry with the list of category of a website that takes 20s to load. All threads blocked on the loading of the cache and the website never start (we ended up by using a cache loader)
The case of null ... a lot of thread are waiting on a cache and are loading null values that are not stored in the cache so ... all threads are waiting one by one for a single lock => thread contention (we ended up storing null values in the cache).

So blocking cache should be opt-in not by default.

Also, consider a cache loader API to be able to pre-populate a cache at startup.

loicmathieu on 23 Jul 2019

Anything that does a CRUD on the cache should be done via annotations IMO.

In my application code, I don’t use annotation-based programming heavily. I use it for simple static cases, like Guice and RestEasy, which offer good escape hatches back to regular code. In a cache there are a lot of possible scenarios, such as using various Map computation methods or attaching callbacks.

Do you envision this feature to be a convenience for simple cases or the preferred programming model? As in, should developers learn the annotations instead of the provider or use either where most appropriate?

ben-manes on 23 Jul 2019

@emmanuelbernard

@galderz oh so by default, all @Cache* something belong in the same name space? I thought it would be qualified by the method name but apparently not if updateHotToys and getHotToys can interact with one another.

^ I made a mistake... by default each would work on separate namespaces (as described in initial description). Here's a full example with both working on same namespace:

@CacheLoad(cacheName="toys")
public List<Toy> getHotToys(Country country) {
    return expensiveService.getHotToys(country);
}

@CacheStore(cacheName="toys")
public List<Toy> updateHotToys(Country country) {
    List<Toy> toys = getHotToys();
    ... // update the list
    return toys;
}

It's worth noting that I've not yet considered what to do in situations like this:

@CacheStore(cacheName="toys")
public List<Toy> appendHotToys(Country country, Toy toy) {
    List<Toy> toys = getHotToys();
    ... // update the list with toy parameter
    return toys;
}

Country is the key, how do we signal that? There are at least two camps: use annotations, e.g. @CacheKey to signal which parameter is part of the key, or use an attribute in CacheStore which indicates which parameter(s) are part of the key, e.g. `@CacheStore(key="country")...

A third approach I had thought about is partial functions (see here), but it'd be way more clunky, and even if you could create a set of partial function definitions to be used, some users might not fully understand it.

Whether we go for the single annotation like @loicmathieu is proposing (with a flag to force update) or or different annotations, I do feel that the proposed initial names are confusing me.

I chose the names based on common names associated with caching. Cache loads, cache stores and cache invalidations... To be more precise, I thought that:

@CacheLoad would try to load the value returned by the method from the cache, if not it'd call the method and store it... It's true that it's not just loading...
@CacheStore would store the value returned into the cache independent of whether the value is present or not.
@CacheInvalidate would invalidate the entry from the cache. Could have used @CacheEvict but I feel invalidate is a more natural fit for a caching use case.

Regardless of the chosen names, I agree with @loicmathieu that having them all start with Cache prefix would make it easier to type/autocomplete for the end user.

galderz on 24 Jul 2019

I don't know if it's a type but CacheLoader has a different meaning : it is usually a way to pre-load a cache before it is used. So CacheLoad is better.

loicmathieu on 24 Jul 2019

@tristantarrant:

We need to declare the semantics of invocation of the underlying method, i.e. whether this is synchronous, akin to a computeIfAbsent while holding a lock on the entry, thus avoiding repeated concurrent invocations of the method in case of a miss.

If the underlying method is non blocking (i.e. it returns CompletionStage) wire it as a non-blocking caching op. Otherwise we may want to wrap it in an executor ? @cescoffier what are the semantics in quarkus for this kind of thing?

+1 on needing to declare the semantics.

If the return type is not async, I'd expect the cache call to be synchronous. If the return type is async, the cache call would be asynchronous and when that completes it'd complete the method return type.

I do have doubts on whether to lock or not. Depends on the type of annotation and whether async or sync:

For @CacheLoad and synchronous: whether to lock or not depends on the cost of locking vs the cost accessing the resource to retrieve the data. If N threads are calling such method concurrently with the same parameter values and you don't lock, the threads are not getting blocked but you might hammer the source of data but the value you get is the same. If locking, the threads get blocked in synchronous (unless we spin them internally), but you put less strain on the source of data... So it depends really. IMO, we shouldn't lock by default in this case.
For @CacheLoad and asynchronous: you have the similar problems but the advantage is that threads are not waiting for the lock. When locks is acquired they can be signalled to continue work and complete the method return type. So, I'd lock by default in this case.
For @CacheStore, no matter if async/sync: 2 threads could be calling the method and updating the cached entry in different ways. I think you need locking here regardless.
For @CacheInvalidate, no matter if async/sync: no locking. Invalidation should be optimistic.

galderz on 24 Jul 2019

@loicmathieu

I don't know if it's a type but CacheLoader has a different meaning : it is usually a way to pre-load a cache before it is used. So CacheLoad is better.

Should be @CacheLoad. Spelling mistake 😅

galderz on 24 Jul 2019

@tristantarrant

I also like the idea of a caching deadline: if the method or the cache.get don't return within a set amount of time, provide either an alternate value or some fault handling. Again, @cescoffier, what are the semantics/rules on invocation timeouts ?

I agree the need for the user to able to configure the timeout, but not sure about the need of alternate value of fault handling. Let's look at the use cases:

For @CacheLoad if underlying cache get times out, the method would be just invoked (same as if the cache did not contain the key). If the cache put that happens after method invocation times out, that could be ignore too. The value is not cached and the user goes on with things. Stats updates would highlight this, maybe track number of timeouts?
For @CacheStore, if the cache put times out, I think the sane thing to do would be for the internal implementation to invalidate the key in a deterministic way. Without this, any other user that calls a @CacheLoad method would get an outdated value. Alternatively, we could maybe invalidate the key first internally and then put it, so if the put times out, any user that calls @CacheLoad would not find it in cache and would call the method and get the correct value.
For @CacheInvalidate: no timeouts should be possible here. The underlying impl should remove the cache entry regardless of whether entry is locked or not.

galderz on 24 Jul 2019

@loicmathieu WRT locking, we seem to have similar ideas. I've split the decision on whether to lock or not depending on the use case and depending on whether caching is sync/async.

galderz on 24 Jul 2019

@emmanuelbernard

@galderz oh so by default, all @Cache* something belong in the same name space? I thought it would be qualified by the method name but apparently not if updateHotToys and getHotToys can interact with one another.

^ I made a mistake... by default each would work on separate namespaces (as described in initial description). Here's a full example with both working on same namespace:
@CacheLoad(cacheName="toys")
public List<Toy> getHotToys(Country country) {
    return expensiveService.getHotToys(country);
}

@CacheStore(cacheName="toys")
public List<Toy> updateHotToys(Country country) {
    List<Toy> toys = getHotToys();
    ... // update the list
    return toys;
}
It's worth noting that I've not yet considered what to do in situations like this:
@CacheStore(cacheName="toys")
public List<Toy> appendHotToys(Country country, Toy toy) {
    List<Toy> toys = getHotToys();
    ... // update the list with toy parameter
    return toys;
}
Country is the key, how do we signal that? There are at least two camps: use annotations, e.g. @CacheKey to signal which parameter is part of the key, or use an attribute in CacheStore which indicates which parameter(s) are part of the key, e.g. `@CacheStore(key="country")...

A third approach I had thought about is partial functions (see here), but it'd be way more clunky, and even if you could create a set of partial function definitions to be used, some users might not fully understand it.

@CacheStore(cacheName="toys")
@CacheKey(provider=CountryKeyProvider.class)
public List<Toy> appendHotToys(Country country, Toy toy) {
    ...
}

public CountryKeyProvider implements KeyProvider {
    // do we need Method as parameter?
    public Object[] key(Object[] parameters) {
        return new Object[] { parameters[0] };
    }
}

emmanuelbernard on 24 Jul 2019

Maybe people already did that but could we take a look at the Spring cache annotations: https://docs.spring.io/spring/docs/current/javadoc-api/org/springframework/cache/annotation/package-summary.html ?

A lot of people use them and are (mostly) happy with them. And someone will probably say at some point that we need a compatibility layer so they might be a good starting point.

Not saying we should be tied to them but I think it's great input to build our own.

gsmet on 24 Jul 2019

👍1

@ben-manes Great feedback, let me write my thoughts:

In my application code, I don’t use annotation-based programming heavily. I use it for simple static cases, like Guice and RestEasy, which offer good escape hatches back to regular code. In a cache there are a lot of possible scenarios, such as using various Map computation methods or attaching callbacks. Do you envision this feature to be a convenience for simple cases or the preferred programming model? As in, should developers learn the annotations instead of the provider or use either where most appropriate?

With this proposal, I've tried to get to the essence of application data caching:

Allow data coming from an expensive service to be cached locally for fast access.
Allow cached data to be updated when updating it in the expensive service.
Allow cached data to be invalidated when removing it from the expensive service.

I don't think you need an API to do that. I don't think you even need an API to make it performant. I think if we think carefully about the use cases, provide async and sync ways to do things, and think about locking and timeouts, we can do all of this without the user having to call any API manually.

I think an API of some sort might be needed to expose information to the user to know how the cache is behaving and how effective it is: cache misses, latency of cache calls,...etc. I have not thought about this yet. I'm open to suggestions!

That doesn't necessarily mean that there are not users out there who don't want closer control on their caches. My proposal is designed to fulfil requirements of those developers who feel at home with annotation-driven development. The success of frameworks such as Spring Boot shows (IMO) that there's a big portion of developers who want this.

So, what shall we do for those who want closer control on caching? I think there's two options here:

We could create a caching API in a second phase. I think the first phase should be just annotations as proposed above.
Let JCache API (wo/ the annotations) fill that void. Having been in the JSR-107 expert group, I've seen how it's impossible to make everyone happy with a caching API. I feel at times JCache tries to do too much with some of the things half baked (e.g. configuration), and at times doesn't do enough (e.g. no async). The benefit that it has is that it's a standard and it's become a lowest common denominator amongst the Java caching industry.

I don't know which of the two options is best. I think we could make something good with the first option, but I fear we'd encounter similar challenges as we found while developing JCache.

galderz on 24 Jul 2019

I exchanged a bit with Sanne on the timeout idea. I am also interested in the answer to Galder's question: should the fallback on timeout be the regular method, it sounds reasonable.

Also if we do @Cache(timeout=3600), I think we would also need something like quarkus.cache.method.f.q.c.n.MyService.getHotToys.timeout=3000 (aligned with how MP does these overridings per method).

But on the fallback approach, we could in theory make it totally separated

public class ToyService {
    @FallbackOnTimeout(timeout=3600, fallback="this::backupToys")
    public List<Toy> getHotToys() {
        return hopeFullyFastService.getToys();
    }

    public List<Toys> backupToys() {
        return someOtherToyService.getToys();
    }
}

This can be an interceptor doing the work.
In the reactive case it would be wired with something like

ReactiveResult<List<Toy>> getHotToys(...) { ... }
ReactiveResult<List<Toy>> backupToys(...) { ... }

#something like that
service.getHotToys().withTimeout(3600).recoverWith(serrvice.backupToys());

This reactive wrapping could be done under the hood via the annotations and an interceptor

emmanuelbernard on 24 Jul 2019

To reply to @ben-manes and @galderz , the other advantage with the annotation based model is that with Quarkus build time extension, you can inline the calls with bytecode enahncement.

emmanuelbernard on 24 Jul 2019

Thanks @galderz, @emmanuelbernard, et al. I like how this discussion is handling the various feedback.

For @CacheLoad and synchronous... IMO, we shouldn't lock by default in this case.

For an in-memory cache, the preference would be double-checked locking style (lock-free read => lock entry => compute if absent). This affords the perf benefits when present and avoids the query storms when absent. A lookup might be useful, but I don't know when a non-blocking cache load would be in the general case.

caching deadline...

If an asynchronous cache, how far can you get using CompletableFuture.orTimeout(duration)? In Caffeine, when done within the cache loader it is removed from the cache when it fails. To return a default value and not cache that, CompletableFuture.copy with an exception handler can be used after retrieving the future from the cache. This avoids building in those features directly into the library and using standard Java apis, which is always really nice.

Is this feature available for a synchronous cache? If so, how? It might require interrupting the computation which has never been natural in Java.

We could create a caching API in a second phase. I think the first phase should be just annotations as proposed above.

I very much agree with your feelings here. I do not think an annotations + API are compatible for cache operations, and the API limited to management functionality. The two can be hard to work together, e.g. JCache lets one configure a CacheLoader/ CacheWriter / CacheEntryListener which will be invoked when a @CacheResult method is called. That isn't compatible as it uses a generated key from the arguments, with the RI not allowing access to those key params. The semantics are different such that annotation-based may not use most JCache features (but the spec lead disagrees, which adds more confusing since the code doesn't work). Similar problems liter the JSR due to a non-orthogonal api design.

Spring Cache is much more successful thanks to the developers being adamant in its limited scope (no configuration, management only api, minimal feature). They push back on feature creep by saying it is only for the simple cases, and that it is okay to use the implementation directly for more advanced needs. I think that is a good pragmatic stance to mimic.

annotation based model is that... you can inline the calls with bytecode enahncement

I vaguely recall the default proxy-based enhancements in Spring being a surprise when calling methods within a class, as that silently bypasses the proxy. So that is definitely a nice win.

With this proposal, I've tried to get to the essence of application data caching...

I agree, and I wouldn't mind if a coworker used it. For simple cases it might be clearer for work with, which is great. My hope is that this doesn't expand to cover advanced cases as that results in incompatibilities (internal and across providers). I'd also hope that users would know that it is okay to use the provider directly when more appropriate; to not be religiously strict one way or the other. There are many common cases this proposal does not address, which is perfectly fine if that sentiment is shared.

ben-manes on 24 Jul 2019

Based on @ben-manes feedback, I agree that a API level will come (even if later).
I wonder how we should proceed on the annotation design vs the API design then:

consider them unrelated and incompatible when worked together (probably not)
design the annotation based approach and the API one after (i.e. in a separate issue)
try and design both here (or enough that we have confidence) but start with the annotation implementation

I have been using 3 in the past and that's the belt and suspender method. It adds a bit of noise but when balanced correctly increase the design quality with on a bit of delay to the feature.

Also, would want to give it s shot ;)
We will definitely want a reactive API and a imperative one.

emmanuelbernard on 25 Jul 2019

Oh BTW I agree, do the 80% use cases well and accept it won't do everything makes sense for the Quarkus usage. We know people will be able to fall back to full Caffeine, Infinispan, etc APIs when they need more.

emmanuelbernard on 25 Jul 2019

@emmanuelbernard

@CacheStore(cacheName="toys")
@CacheKey(provider=CountryKeyProvider.class)
public List<Toy> appendHotToys(Country country, Toy toy) {
    ...
}

public CountryKeyProvider implements KeyProvider {
    // do we need Method as parameter?
    public Object[] key(Object[] parameters) {
        return new Object[] { parameters[0] };
    }
}

This resembles similar ideas to JCache's CacheKeyGenerator. It uses a class called CacheKeyInvocationContext to wrap invocation information. It adds target object as well parameters. Worth noting that JCache has both options: @CacheKey and pluggable CacheKeyGenerator. The former is easier to use, the latter more powerful. Shall we have both options?

galderz on 26 Jul 2019

@gsmet Re: Spring cache annotations - Yeah, both Spring Cache annotations and JCache annotations have been inspiration for the suggestions here.

galderz on 26 Jul 2019

@emmanuelbernard Re: fallback

@FallbackOnTimeout(timeout=3600, fallback="this::backupToys")

I like the idea of having a separate concept and like the use of method references.

How would you deal though with situations where the backup method requires a parameter? If the expected parameter(s) are already passed to the getHotToys, then I guess passing it on to backupToys would be easy to do.

But what about if the fallback method requires a different parameter type and/or value? Would this be too far fetched?

I did a little experiment and you could use something akin to partial functions to do this for the reactive case (see here). Not very user friendly though :| . Minor thing to note in the code sample is the use of Java Time Duration instead of a long/int for timeout.

galderz on 26 Jul 2019

@ben-manes

For an in-memory cache, the preference would be double-checked locking style (lock-free read => lock entry => compute if absent). This affords the perf benefits when present and avoids the query storms when absent.

Hmmmmm, the problem I have with this is the following: say you have 100 threads doing calling such a method simultaneously. All 100 see the entry is not cached. One of them goes, acquires the lock an goes to the service... What do the other 99 threads do? In the sync case they'd have to wait around for the lock to be released for N seconds. In the async/reactive case is not so bad because those threads can be doing other things.

We've had similar issues in the past with Hibernate 2L cache implementations with Infinispan and JBoss Cache. What used to happen there is that all 100 threads would go to the database (so in our case they'd all execute the method) and when they go to update the cache, they do a putIfAbsent with N second lock timeout. One thread will succeed the put and the others were waiting around. We ended up switching this so that the putIfAbsent would wait 1ms for timeout. Because if it fails to acquire the lock, someone else is putting the same thing on the cache and we can just send back what we read from the DB to the user. This has worked better of us.

Of course, everything depends on the behaviour characteristics of the source of data.

I have noticed that Spring Cacheable has sync=false so it favours my suggestion of not locking. I think they do it for a good reason. They favour performance of app vs hammering the source of data.

I don't think doing that blindly it's a good idea either. That's why I suggested a middle ground earlier: lock when using async, but don't lock with sync. It might sound a bit crazy/ confusing but this way in the async case threads can do other things. In the sync case you have created a bottleneck.

A lookup might be useful, but I don't know when a non-blocking cache load would be in the general case.

^ What do you mean exactly?

If an asynchronous cache, how far can you get using CompletableFuture.orTimeout(duration)? In Caffeine, when done within the cache loader it is removed from the cache when it fails. To return a default value and not cache that, CompletableFuture.copy with an exception handler can be used after retrieving the future from the cache. This avoids building in those features directly into the library and using standard Java apis, which is always really nice.

This has been a hot topic for us recently. @emmanuelbernard and @cescoffier have been looking at the use of standard Java APIs for reactive APIs. I'll let them explain better their findings.

Re: timeouts

Is this feature available for a synchronous cache? If so, how? It might require interrupting the computation which has never been natural in Java.

Hmmmm, very good point. I don't know TBH. It all depends on whether sync calls to the service can be interrupted at all. I don't think we can now it advance. Maybe this is something that we can only guarantee for the reactive/async version? Unless there's some magic bytecode manipulation we can do? @stuartwdouglas @Sanne @emmanuelbernard?

galderz on 26 Jul 2019

@ben-manes Some more replies:

There are many common cases this proposal does not address, which is perfectly fine if that sentiment is shared.

^ What other common cases did you have in mind? Could you share them here briefly? It'd help completeness.

galderz on 26 Jul 2019

@emmanuelbernard @ben-manes

I wonder how we should proceed on the annotation design vs the API design then:

consider them unrelated and incompatible when worked together (probably not)

design the annotation based approach and the API one after (i.e. in a separate issue)

try and design both here (or enough that we have confidence) but start with the annotation implementation

I think my preference would be 2.

JCache was essentially an attempt to do 3 which ended up with inconsistencies (as noted by @ben-manes here and here). It is true though that the inconsistencies could have been fixed...

galderz on 26 Jul 2019

@ben-manes One more thing about to lock or not: whichever default we end up agreeing on, we should have an option to switch on/off the locking in similar fashion to Spring's Cacheable#sync. This would avoid locking ourselves up in one case or the other.

galderz on 26 Jul 2019

@emmanuelbernard Re: fallback

@FallbackOnTimeout(timeout=3600, fallback="this::backupToys")

I like the idea of having a separate concept and like the use of method references.

Be careful, this is not a method reference. This is method reference-like in a String.
You can't do method reference in annotations as method references are instances.

How would you deal though with situations where the backup method requires a parameter? If the expected parameter(s) are already passed to the getHotToys, then I guess passing it on to backupToys would be easy to do.

Good point, you would need the same list of parameters. Hum, disappointed :)

But what about if the fallback method requires a different parameter type and/or value? Would this be too far fetched?

We need to explore with pseudo examples, but it's here to avoid a if statement and control the backup flow for the user so I suppose having the same set of parameters is likely fine.

I did a little experiment and _you could_ use something akin to partial functions to do this for the reactive case (see here). Not very user friendly though :| . Minor thing to note in the code sample is the use of Java Time Duration instead of a long/int for timeout.

yes I don't get it looking at the code a minute or two. That's my threshold to keep things simple. and it works for me too ;)

Also, the reactive case is likely easy enough to write plainly by chaining the result types assuming correct operators. The hard case of a user is the synchronous one.

emmanuelbernard on 26 Jul 2019

This resembles similar ideas to JCache's CacheKeyGenerator. It uses a class called CacheKeyInvocationContext to wrap invocation information. It adds target object as well parameters. Worth noting that JCache has both options: @CacheKey and pluggable CacheKeyGenerator. The former is easier to use, the latter more powerful. Shall we have both options?

Ok so better align in name, though you scare me about the invocation context thing.

As for getting both options, here is my take: if we think >80% of the usage are going to take app method parameters into account, then I would go for the no annotation option and the full flexible provider option.
If we think that it's more that a subset of a method parameter are the key, then we will need the @CacheKey but I could not read anywhere of that's only for a single param or if you can annotate several parameters with @CacheKey

I also noticed that there is @CacheValue which is necessary to update a cache on changes. In your current proposal @galderz, you do require that update method to return the updated value. I don't know if that will be considered annoying or not.

emmanuelbernard on 26 Jul 2019

Thanks @galderz,

all 100 threads would go to the database... and when they go to update the cache... One thread will succeed the put and the others putIfAbsent... wait 1ms for timeout... If it fails to acquire the lock, someone else is putting the same thing on the cache and we can just send back what we read from the DB to the user.

vs

One of them goes, acquires the lock an goes to the service... What do the other 99 threads do? In the sync case they'd have to wait around for the lock to be released for N seconds.

The sync case is the positive outcome, not the negative. In the first case you have put a high load on your database: the query may be slow and redundant work is wasting capacity. Each thread is performing network, entity hydration, etc work needed to instantiate the value. The resource contention means that the fastest result from the 100 parallel queries will be slower performing just the single query. The 99 threads did no useful work, performed costly redundant work, and slowed down unrelated work sharing those resources (e.g. local cpu, db).

The sync case allows 99 threads to be context switched out. The load is performed once and all 100 threads return it. While the 99 threads are inactive, the local system can service other requests or perform the loading work (e.g. decompression) with much less resource contention.

I have noticed that Spring Cacheable has sync=false so it favours my suggestion of not locking. I think they do it for a good reason. They favour performance of app vs hammering the source of data.

I think they did it because not all caches supported synchronous loading with per-entry locks (including their default, ConcurrentHashMap) and assumed it would require a global lock that also blocked reads on a cache hit. A valid implementation is lock-free on a cache hit and locking only on the miss. Spring added the sync attribute later, when revisiting Spring Cache to migrate the Guava adapter to become Caffeine's and simultaneously flushed their backlog.

I suggested a middle ground earlier: lock when using async, but don't lock with sync. It might sound a bit crazy/confusing but this way in the async case threads can do other things. In the sync case you have created a bottleneck.

The sync bottleneck shifts from cheaply waiting on the lock to instead consume limited resources and increasing the response times. The exact same model is supported by an async cache. The future is inserted into the cache, the work dispatched, and all 100 threads are given the future. They can do other things and again the value is computed only once, not 100 times.

A lookup might be useful, but I don't know when a non-blocking cache load would be in the general case.
^ What do you mean exactly?

haha, I actually meant your case of sync=false. I agree, it should be an option either way and we seem to be disagreeing on whether it should be on by default. In Caffeine/Guava, we try to nudge people towards loading through the cache as a best practice.

What other common cases did you have in mind? Could you share them here briefly?

Refresh: reload async if accessed after a time interval as the value may be stale
- Like expiration, except that evicts so next request is blocking. This is optimistic.
- May use old value as fast check, e.g. by timestamp, before expensive load
- Common complaint in Spring Cache, since it doesn't configure Caffeine in a manner to enable its support
Removal notifications: the key, value, and cause
- This might be atomic on removal or async (afterwards), depending on the user's requirements
Bulk load: optimize for multiple key lookups

It is also happens that users sometimes try to use a Map-based cache for single item memoization. In that case, I try to advise using Guava's Suppliers.memoize or Suppliers.memoizeWithExpiration as much ligherweight. There may be value in deciding how to answer the no-key case in your designs (not applicable, users provide a surrogate key, allow no key (internal surrogate), or new annotations).

ben-manes on 27 Jul 2019

@galderz

By default, each annotated method use a namespaced cache whose name is derived from the type, method and parameter type information.

I am not sure to understand how this sentence should be applied to the following case:

@CacheLoad
public List<Toy> getHotToys(Country country) {
    return expensiveService.getHotToys(country);
}

@CacheInvalidateAll
public void clearHotToys() {
}

My understanding is that these two annotations would use different namespaced caches since the methods signatures are different. Then how would we invalidate all entries from the @CacheLoad annotated method? When would @CacheInvalidateAll be used without declaring an explicit cacheName parameter? Same question goes for @CacheStore and @CacheInvalidate.

gwenneg on 29 Jul 2019

On Fri, Jul 26, 2019 at 5:39 PM Galder Zamarreño notifications@github.com
wrote:

I don't think doing that blindly it's a good idea either. That's why I
suggested a middle ground earlier: lock when using async, but don't lock
with sync. It might sound a bit crazy/ confusing but this way in the async
case threads can do other things. In the sync case you have created a
bottleneck.

I don't think having different behaviors is a good idea.

And I also think that in a lot of cases, you don't want to hammer your
datasource even in the sync case. At least, that's what we did at
$previousJob: we preferred waiting for the result being available in the
cache before pursuing: the database was the performance sensitive part and
it's often the case.

--
Guillaume

gsmet on 29 Jul 2019

@galderz

By default, each annotated method use a namespaced cache whose name is derived from the type, method and parameter type information.

I am not sure to understand how this sentence should be applied to the following case:
@CacheLoad
public List<Toy> getHotToys(Country country) {
    return expensiveService.getHotToys(country);
}

@CacheInvalidateAll
public void clearHotToys() {
}
My understanding is that these two annotations would use different namespaced caches since the methods signatures are different. Then how would we invalidate all entries from the @CacheLoad annotated method? When would @CacheInvalidateAll be used without declaring an explicit cacheName parameter? Same question goes for @CacheStore and @CacheInvalidate.

The no-cacheName annotations actually make sense in the JCache specification because the cache name defaults to CacheDefaults.cacheName(). Do we want a CacheDefaults annotation in Quarkus too?

gwenneg on 31 Jul 2019

The no-cacheName annotations actually make sense in the JCache specification because the cache name defaults to CacheDefaults.cacheName(). Do we want a CacheDefaults annotation in Quarkus too?

If we have a global default, it's better to have it as a Quarkus property.

quarkus.cache.methods.default-cache=caramba

emmanuelbernard on 31 Jul 2019

@emmanuelbernard quarkus.cache.methods.default-cache looks like an application-wide default cache name property while JCache's CacheDefaults is only a class-level annotation. Would you add a property for each class that needs a default cache name?

gwenneg on 31 Jul 2019

Ah ok. Please forget what I said.
I have to admit I’d love for someone to take a step back and come with 5 to 10 code examples with various annotations options, so we can see the logic and concrete usages. It feels a bit disorganized in my head at the moment.

emmanuelbernard on 31 Jul 2019

Ok, I'll do that @emmanuelbernard.

gwenneg on 1 Aug 2019

Question following a @ben-manes comment in the draft PR:

Do we want to allow the following code?

class Alpha {
    @CacheLoad(cacheName="commonCache")
    public String getExpensiveString(@CacheKey key) {
        return "I was returned from Alpha";
    }
}
class Beta {
    @CacheLoad(cacheName="commonCache")
    public String getExpensiveString(@CacheKey key) {
        return "I was returned from Beta";
    }
}

If this is allowed, then it will block the usage of cache refresh (see Ben's comment for details).

gwenneg on 1 Aug 2019

Was away last week, catching up on things. I'll be replying to all your comments during the day :)

@emmanuelbernard

As for getting both options, here is my take: if we think >80% of the usage are going to take app method parameters into account, then I would go for the no annotation option and the full flexible provider option. If we think that it's more that a subset of a method parameter are the key, then we will need the @CacheKey but I could not read anywhere of that's only for a single param or if you can annotate several parameters with @CacheKey

You can indeed use @CacheKey to annotate several parameters. It's explained in the spec and our Infinispan code matches that.

JCache has both options annotation and SPI methods. Spring Cache does not have key specific annotation but they use Cacheable.key attribute can indicate which parameter is the key, see here. I think Cacheable.key can break easily, e.g. if you rename the parameter but forget to update the annotation attribute value? This won't be caught by the compiler and you will only find out at runtime.

I do feel we need both approaches, but if I had to start with one I'd start with the more flexible approach of the SPI. The annotation can provide an easier way to do things later. Bear in mind that the simplest thing is always going to be that all the parameters to the method are part of the key, where no extra SPI or annotations need to be added.

I also noticed that there is @CacheValue which is necessary to update a cache on changes. In your current proposal @galderz, you do require that update method to return the updated value. I don't know if that will be considered annoying or not.

Glad you noticed. Yeah, this is something JCache and Spring deviate on. For JCache, you'd write something like:

@CachePut
public void updateHotToys(Country country, @CacheValue List<Toy> toys) {
    ...
}

And for Spring (see here):

@CachePut
public List<Toy> updateHotToys(Country country) {
    ...
}

I've been slightly torn apart between both options. I'm not massively keen on the JCache approach because it adds yet another annotation, and calling a method that just updates the cache ends up with an empty method body...

I do feel that the Spring approach is more consistent with how the cache load would happen: in both cases the return type is the value. But it has the downside that the update itself has to come from inside the body of the method, which might require being wrapped in an object that has certain state?

Right now my preference is for the Spring approach, and my original design was based on that.

galderz on 7 Aug 2019

@ben-manes

Thx for the background info on the origins of sync attribute and extra explanations.

I think your default behaviour makes more sense from a caching perspective: lock-free on cache hit, lock on cache miss. This minimises the amount of locking done.

If we add an option to control this, the other option would be: lock-free on cache hit, lock-free on cache miss. I can't think of a good name for the attribute/option right now...

Thanks for sharing extra options for caching. From my POV, the key thing right now is if we can achieve those down the line with a pure annotation approach. In other words, if they can done purely with annotations, I want to make sure that whatever we design doesn't put us in a dead end when it comes to implementing these suggestions:

Refresh: reload async if accessed after a time interval as the value may be stale

^ Like that. Other than the time interval, is there any extra configuration you'd expect with this? This time interval could be an attribute for @CacheLoad annotation.

Removal notifications: the key, value, and cause

Hmmmm, I assume you mean removal notifications of data that has been removed from the cache. What's the use case you have in mind exactly? When would the user be interested to find out about these?

And what kind of removals do you have in mind? Removals triggered by calls to methods with @CacheInvalidate/@CacheInvalidateAll? Or removals triggered by internal eviction/expiration? Talking of which...

Sooner or later we'll need a way for the caches to expire things when memory is "tight". I've not thought about this yet, and it has its complexities (e.g. control number of elements, or control memory in use...etc). Even if using a remote cache server in the backend, a near cache is a common feature that speeds up things.

Personally, I'd like eviction to be something that happens automatically. This means that the user does not need to configure anything and the system decides when cache memory is tight and evicts things based on a given algorithm. This is what happens with CPU caches, we don't go and configure the size of those. I don't think the user should have to worry about this.

With this in mind, I think it could be interesting for the user to be notified when the internal logic decides to evict something from memory, because it could decide to bring it back... However, if memory is tight, it could lead to problems if the user keep adding things that the internal cache system decides to remove.

Related: I've also left out lifespan and max idle expiration settings for cached data. Although users might be used to configuring these, I've found very few small use cases where that's really needed: e.g. when there's a business constraint to keep some cached data for a given duration.

Bulk load: optimize for multiple key lookups

Makes sense. It'd be nice to have a pure annotation approach for this. Don't think it'd be too hard since the difference here is just that you have N keys as opposed to 1 key.

It is also happens that users sometimes try to use a Map-based cache for single item memoization. In that case, I try to advise using Guava's Suppliers.memoize or Suppliers.memoizeWithExpiration as much ligherweight. There may be value in deciding how to answer the no-key case in your designs (not applicable, users provide a surrogate key, allow no key (internal surrogate), or new annotations).

I've not thought about that, but I can see the use for it.

galderz on 7 Aug 2019

@gwenneg

My understanding is that these two annotations would use different namespaced caches since the methods signatures are different. Then how would we invalidate all entries from the @CacheLoad annotated method? When would @CacheInvalidateAll be used without declaring an explicit cacheName parameter? Same question goes for @CacheStore and @CacheInvalidate.

Well spotted, this premise doesn't work as well. Let's revise it, I see two options:

Change the default such that: each annotated method use a namespaced cache whose name is derived from the FQN of the type. This would make naturally lead to all annotated methods in same class to share the same cache namespace by default. You'd still have the possibility to use cacheName to tweak things further.
Make cacheName a mandatory attribute of annotations. This makes things more explicit and a little bit more verbose.

I'd favour the first option.

galderz on 7 Aug 2019

@gwenneg

The no-cacheName annotations actually make sense in the JCache specification because the cache name defaults to CacheDefaults.cacheName(). Do we want a CacheDefaults annotation in Quarkus too?

Hmmm, don't like the idea of adding an annotation for this.

Over the years we've had a lot of problems due to the default cache in Infinispan. It forces you to think in a slightly different way to those caches that represent a namespace. These days we're trying to move away from the default cache. That's why I'd like all Quarkus user caches to have a namespace by default and avoid having a default namespace catch-all. My initial proposal on the default namespace was too fine. I think a coarser approach (based on type name) could work better.

galderz on 7 Aug 2019

@gwenneg

Do we want to allow the following code? If this is allowed, then it will block the usage of cache refresh (see Ben's comment for details).

First of all, kudos on coming up with the example. It's a non-trivial example that requires some thought 👏

Before we delve into the issue, note that the code could be simplified since @CacheKey would not mandatory. If the method has a single parameter, it can be safely assumed that this is the key:

class Alpha {
    @CacheLoad(cacheName="commonCache")
    public String getExpensiveString(Object key) {
        return "I was returned from Alpha";
    }
}
class Beta {
    @CacheLoad(cacheName="commonCache")
    public String getExpensiveString(Object key) {
        return "I was returned from Beta";
    }
}

Even if the method has multiple parameters, we could assume that all parameters are part of the key and generate one with the combined values of those. Only when one, or more parameters, are not part of the key the user would need to use @CacheKey.

Going back to the code itself, without refresh, I think this code is valid. It's in the user's hands to decide which method to call, and so the user is in control of what goes into the cache. Note that once one of the method is called and the String is cached, calling the other method won't override the cached entry since the key is already present (assuming same key of course).

If @CacheLoad takes a refresh option and both annotations would define a value in the example, then I think this code should not be allowed. The reason it should not be allowed is because the cached result would not be deterministic.

The only use case I can see for this code is if you have two datasources for the same type of data that you're trying to put into a single cache. This could happen if say you're migrating from one data source to the other. In that case, I think refresh should only be allowed in one of the annotations, not in both...

So, in summary, I don't think our current annotation design blocks refresh, though we'd have to think in which cases refresh makes sense.

galderz on 7 Aug 2019

@galderz

First of all, kudos on coming up with the example. It's a non-trivial example that requires some thought 👏

The merit goes to @ben-manes, I only relayed here with this example a question he asked in the draft PR :)

Thanks for your comments, I will update the PR todo list and content accordingly.

gwenneg on 7 Aug 2019

@gwenneg

My understanding is that these two annotations would use different namespaced caches since the methods signatures are different. Then how would we invalidate all entries from the @CacheLoad annotated method? When would @CacheInvalidateAll be used without declaring an explicit cacheName parameter? Same question goes for @CacheStore and @CacheInvalidate.

Well spotted, this premise doesn't work as well. Let's revise it, I see two options:

Change the default such that: each annotated method use a namespaced cache whose name is derived from the FQN of the type. This would make naturally lead to all annotated methods in same class to share the same cache namespace by default. You'd still have the possibility to use cacheName to tweak things further.

Make cacheName a mandatory attribute of annotations. This makes things more explicit and a little bit more verbose.

I'd favour the first option.

I like the first option too. I'll do a first implementation right away and prepare a dedicated code example for the future Quarkus caching guide.

gwenneg on 7 Aug 2019

@galderz

Refresh... Like that. Other than the time interval, is there any extra configuration you'd expect with this? This time interval could be an attribute for @CacheLoad annotation.

Caffeine currently supports refreshAfterWrite(duration), making an entry eligible for refresh based on a fixed time after the entry's last write (create/update). When eligible, the entry will be refreshed on its next read asynchronously and the old value returned while that is in-flight. This is not scheduled refresh, where all entries are reloaded regardless of usage, since that is trivial to implement independently and rarely desirable.

There is a feature request to add refreshAfter(refresher) so tha the time interval can be computed dynamically, per entry. This is for a CDN-style case, I believe.

Caffeine, and other libraries, require the reload function and duration upfront during construction. This hamper's Spring Cache which only provides the loading function on a get(key, func). I don't think providing the reload function through the Cache API is appropriate (e.g. cache.get(key, loader, reloader)), and we try to nudge users into using a LoadingCache as a less error-prone abstraction. If there is only one @CacheLoad method then you can supply it upfront to the cache and leverage its features.

The reload function differs slightly from load by supplying the old value. That delegates to CacheLoader.load by default for simpler code when it isn't needed.

I don't believe the timeout should be on the annotation, as that moves into policy configuration territory which differs between providers. Spring Cache was correct in avoiding policy configuration, whereas JCache is harmful by dictating it.

removal notifications of data that has been removed from the cache. What's the use case you have in mind exactly? When would the user be interested to find out about these? And what kind of removals do you have in mind?

Users might need to close resources or log to get familiarity. We supply a RemovalCause enum of why (explicit, replaced, collected, expired, size).

Caffeine supports both async notification (via RemovalListener) and atomic (via CacheWriter). The latter is part of the Map.compute block, letting calling the writer atomically with the removal. That is helpful in multi-layer caches where you don't want a race condition on removals and load for the same key.

Sooner or later we'll need a way for the caches to expire things when memory is "tight". Personally, I'd like eviction to be something that happens automatically. This means that the user does not need to configure anything and the system decides when cache memory is tight and evicts things based on a given algorithm. This is what happens with CPU caches, we don't go and configure the size of those. I don't think the user should have to worry about this.

I disagree with dictating any policy configuration; leave that to the provider.

CPU caches are special as this is size-based and dictated at fabrication time.

To automatically manage memory, you have two options:

Reference cache, e.g. soft-references. These sound good in theory and is horrible in practice.
Global size with dynamic allocation. See robinhood caching for one promising scheme. This can be done as a layer above the cache provider if the right metrics / knobs are available.

So, in summary, I don't think our current annotation design blocks refresh, though we'd have to think in which cases refresh makes sense.

I would recommend trying it with a few chosen providers. As described in the Caffeine case, it might not be compatible. Ideally you don't reimplement provider features, like refreshing.

ben-manes on 7 Aug 2019

@emmanuelbernard Here are a few examples of what's already working so far in the draft PR:

The cacheName parameter is optional for all annotations. If it is absent, the class FQN is used as the cache name. It means there's no need for some kind of @CacheDefaults annotation like the JCache one to specify a default cache name at class level. For example, all of the following methods will use the org.acme.ToyService cache:

package org.acme;

public class ToyService

    @CacheLoad
    public List<Toy> getToys(Country country) {
    }

    @CacheStore
    public List<Toy> getToys(Country country) {
    }

    @CacheInvalidate
    public void removeToy(Country country) {
    }

    @CacheInvalidateAll
    public void clearToys() {
    }

}

The @CacheKey annotation is optional. If it is absent, all method parameters are considered as elements of the key. Both of the following methods will use the same key composed of country and age:

@CacheLoad(cacheName = "toysPerChildAge")
public List<Toy> getHotToys(Country country, int age) {
    return expensiveService.getHotToys(country, age);
}

@CacheLoad(cacheName = "toysPerChildAge")
public List<Toy> getHotToys(@CacheKey Country country, @CacheKey int age, BigDecimal maxPrice) {
    List<Toy> toys = expensiveService.getHotToys(country, age);
    ... // return toys list filtered on max price
}

Keys can be composed of one to several elements.

The cached value for @CacheLoad and @CacheStore is always the annotated method returned value. There is no @CacheValue annotation.

You can also take a look at the tests of all these annotations options.

gwenneg on 12 Aug 2019

@ben-manes

I don't believe the timeout should be on the annotation, as that moves into policy configuration territory which differs between providers. Spring Cache was correct in avoiding policy configuration, whereas JCache is harmful by dictating it.

Good point.

There's an extra point to add here about annotations having timeout details. Timeouts can differ depending on the environment, so actual durations might be better provided by runtime properties, while the annotation can maybe define what to do?

So for example, the annotation could be configured with the refresher SPI to use, but when the duration after which refresher the refresher kicks in comes from the properties?

It's true that this is a complex use case, which probably will need its own issue to discuss/design it detail. Maybe you could drive that?

Users might need to close resources or log to get familiarity. We supply a RemovalCause enum of why (explicit, replaced, collected, expired, size). Caffeine supports both async notification (via RemovalListener) and atomic (via CacheWriter). The latter is part of the Map.compute block, letting calling the writer atomically with the removal. That is helpful in multi-layer caches where you don't want a race condition on removals and load for the same key.

+1, makes sense.

I disagree with dictating any policy configuration; leave that to the provider.

That'd be fine. We can always push it up at a later stage.

I would recommend trying it with a few chosen providers. As described in the Caffeine case, it might not be compatible. Ideally you don't reimplement provider features, like refreshing.

+1

galderz on 12 Aug 2019

Quick question about cache names: should the following case be allowed?

package org.acme;
public class Foo {
    @CacheLoad // cache name defaults to "org.acme.Foo"
    public Object doSomething(Object key) {
        // ...
    }
}

public class Bar {
    @CacheInvalidate(cacheName = "org.acme.Foo")
    public Object doSomethingElse(Object key) {
        // ...
    }
}

This can be detected and forbidden at build time in multiple ways:

A prefix (for example default_) can be added to default cache names and disallowed at the beginning of a user chosen cache name.
Default cache names can be listed at build time and then user chosen cache names can be disallowed if they match a default cache name.

Or maybe this is perfectly fine and there's nothing to do about it.

WDYT?

Edit: I changed the second cache annotation.

gwenneg on 14 Aug 2019

@gwenneg I think that should be allowed. It would of course be better if the user defined their own cache names in such case, but it is fine as is.

galderz on 16 Aug 2019

👍1

(I didn't click the right button sorry!)

gwenneg on 22 Aug 2019

@gwenneg I think that should be allowed. It would of course be better if the user defined their own cache names in such case, but it is fine as is.

Me too, it will only go wrong if you explicitly define your own cache names, not if you use the default ones.

Davio on 23 Aug 2019

Thanks for your comment @Davio! This case is completely allowed, without any kind of build time check or prefix usage in the current version of the PR.

We've discussed many details of this extension here, but there are also some discussions in the PR (#3394). Do not hesitate to take a look and leave some comments :)

gwenneg on 23 Aug 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you!
We are doing this automatically to ensure out-of-date issues does not stay around indefinitely.
If you believe this issue is still relevant please put a comment on it on why and if it truly needs to stay request or add 'pinned' label.

stale[bot] on 13 Nov 2019

That's still in progress.

gwenneg on 13 Nov 2019

Hi all! Thanks a lot for your patience with this, in particular @gwenneg who has been working really hard on the implementation 👏👏👏.

With the help of @Sanne and @emmanuelbernard I’ve been reviewing the current design and progress, to decide whether we’re ready to release or not. The short answer is that we are but we will go a bit conservative in what we expose initially. We’ve noted a few areas that we need to decide on:

Annotation names

No one seems happy with CacheLoad, so we need a new name for it. If I had to choose between Cacheable and CacheResult, I’d probably go for the latter. How do people feel about switching to CacheResult?

Locking:

After going backwards and forwards in this issue, we agreed that lock-free on cache-hit and lock on cache-miss was the best combo for a user that does not want to think about these issues. Doing this however would raise a big question mark: what happens on lock timeout? Do we need to notify the user of that? If so, how?

If we look at CacheResult, the lock timeout could be just ignored and the callers waiting for the lock would just invoke the method. Since the lock was not acquired, the cache would not be updated, which would be ok since no stale data would be there.

If we look at CacheStore, a failure to acquire the lock would be problematic. If you ignore it and don’t update the cache, the cache would be left with stale data. Not good. We could log it, but there might be a better way out, see next section.

Note that we can get started and release the first preview without the lock scheme if we agree to work on it for the next release. It won’t be worse than what Spring does.

Remove `@CacheStore`?

While thinking about locking, and in particular the issues that CacheStore has, the early feedback from @loicmathieu (see here) came to my mind:

Technically, we don’t need @CacheStore. A method that inserts or updates the cached value could simply be annotated with @CacheInvalidate in such a way that the invalidation happens first, before invoking the method. That’d be just fine, the next time @CacheResult gets called, the data is brought back from cache and all good.

The performance might not be the optimal compared to implementing @CacheStore right away, but we can always add it later if we really need it.

If we don’t add @CacheStore, part of the problem of the locking problem goes away.

Note that another way around that is the following:

@CacheInvalidate
@CacheResult
public void foo(@CacheKey String key, @CacheValue String value) { … }

But it’s likely less readable than a proper @CacheStore.

Question for Spring Cache annotation users, are you feeling bothered or limited by the fact that a cache put method must return the value to be put in the cache? It feels like the natural contract would be more

@CachePut public void updateBook(ISBN isbn, Book book) rather than @CachePut public Book updateBook(ISBN isbn).

Async method support

Related to locking is the async support. Imagine that we annotate a synchronous method with CacheResult whose implementation takes 20 minutes to compute something. If we do lock on cache-miss, how do we interrupt that method to release the lock? We can’t do that easily.

This can be easily achieved if the result of the invocation in asynchronous (whether using CompletionStage or whatever else we use).

If we want to lock, I think we need to have this in place soon. See below for a suggested plan.

Default Cache Name

After thinking about this further, no default-cache-name strategy seems to be optimal.

The default cache name in Spring Cache seems to be “” (I’m no expert in Spring Cache so correct me if I’m wrong!), which would mean that by default all data from all annotations falls under the same cache. Although this might seem Ok, it has implications with the naming. For example, what name should we give it so that we can set provider specific settings? e.g. expiration, maximum size...etc? And more importantly, the user implicitly puts all cached data in the same cache under possibly overlapping keys and that feels very wrong.

The default cache name in JCache is based on the type, method name and method parameters, which was @ben-manes said, can lead to many caches being created inadvertently.

For quarkus we thought of using just the type as cache name, but this feels weird too and can be confusing. @gwenneg found some corner cases where you could end up explicitly using the cache name that is implicitly used elsewhere (e.g. using “com.acme.Person” as cache name).

What about making cache name mandatory instead? Although more verbose, it would force users to think about their cache namespace structure early, which would be a good thing because most use cases would require that anyway. It also simplifies runtime application property assignment.

We can always add a default cache name later on. We could even remove the mandatory requirement of the attribute which shouldn’t break users that were setting it?

Key matching

Defining the key with @CacheKey is a bit verbose but is only needed if the @CacheResult annotated method has more than one parameter. It would be good to explore if Qute - CC @mkouba - could be used as an expression language to define an EL building the key.

@CacheResult(key=”uppercase(#isbn)”)
public Book getBook(String isbn) { … }

Cache invocation specification

One thing we’ve not specified in detail is when cache invocations happen with respect to method calls. This section tries to spell this out to make sure we’re all on the same page:

For CacheResult, the cache read would happen before method invocation.
For CacheResult, the cache write would happen after method invocation.
For CacheInvalidate and CacheInvalidateAll, the cache invalidation would occur before invoking the method.
CacheStore is not specified since the suggestion is to remove it for now.

This behaviour would not be configurable at this stage.

Plan

So the proposed plan is as followed:

Rename @CacheLoad into @CacheResult
Remove @CacheStore entirely for now: it will hurt but make people think of @CacheInvalidate
Keep @CacheInvalidate and @CacheInvalidateAll
---- First Preview version ---
Plan for the lock on cache miss approach but not stop the initial Preview for it
Including what to do on lock timeout value and notification (needs to be specified)
Non blocking types support
---- “MVP” ---
API for the more advanced use cases
Blocking and non blocking
Could be high order function based
Explore Qute as a EL engine for key
Reevaluate the come back of @CacheStore or equivalent

Thx once again for your patience. Let us know how this sounds :)

galderz on 4 Dec 2019

@galderz great description/analysis ! Maybe we need to keep this kind of _design document_ somewhere ?

Two remarks here:

Default cache name : for readonly case it can be easily set to . as the parameters will be used by the key so there cannot be any overlap due to overloads. But for read/write cache this is more complex as one method will read and another write. In this case I think that asking the user to give a proper name is acceptable.
CacheInvalidate is a write operation at cache level so needs to occurs after the method call (and if multiple cache annotation, this needs to be the first that takes effect). In order to don't invalidate a cache if the method decorated with the annotation fails.
I'm against CacheInvalidateAll and rather have an API to be used to invalidate the cache (a CacheManager API). It's and admin task not an operatonal task like what offers the other cache annotations.

loicmathieu on 4 Dec 2019

hi @loicmathieu !

The reason we went for requiring a cache name in this first round is really that it's easy to start by requiring it, and relax later. There's a couple of tradeoffs in the different options which we believe need a bit further discussion, so let's go with requiring it now - we can always follow up with quick, small adjustements.

Essentially, striving for forward compatible design to expedite this.

Regarding the CacheInvalidate note, could you elaborate on the problem? I'm assuming that when we have both @CacheInvalidate and @CacheResult we should be able to generate the operations in the right order; also I'd hope we could apply some implementation specific optimisations, such as maintain the lock, the hashcode, the lookup of the datacontainer entry.

But again even inf this case, the direction @galderz recommended is primarily so that we can have such conversations deferred, so not making them essential for an MPV

Sanne on 4 Dec 2019

@Sanne @CacheInvalidate is a side effect annotation, I expect side effect to occurs only after a method successful invocation (it's more a design best practice).

One use case I can think of is for a method that will have both @CacheInvalidate and @CacheResult in order to replace a cache item inside the cache. If this method takes long time to compute and the invalidation happens before but the method didn't success, the cache will be empty and the long computation will happens again ...

But as it's a design choice, I let you guys choose :)

loicmathieu on 6 Dec 2019

I don't tend to rule in absolute, while the invalidation is a side effect, it is the safer choice to over invalidate than to under invalidate. At least safer for your data.

emmanuelbernard on 6 Dec 2019

👍1

thanks @loicmathieu , it's a good point to keep in mind, although I prefer like @emmanuelbernard said to be safe by default. We haven't talked about transactional caches, but in case we wish to get there we should also consider isolation levels for such semantics.

Specifically for your use case of the long running computation; it sounds like you would favour reading the possibly out-of-date data over having to try recomputing again. If that's the case wouldn't you get that by simply using @CacheResult exclusively (without the invalidate) ?

I'd expect that the purpose of @CacheInvalidate is to handle the opposite requirement: expressing the need to invalidate no matter what happened. And there's of course the option to invoke two methods sequentially to drive specific more advanced needs.

Sanne on 6 Dec 2019

@galderz great description/analysis ! Maybe we need to keep this kind of design document somewhere ?

Yeah, we probably should write one so that the specification is more clearly defined. Might be a good time to do it sometime between first preview and MVP.

Default cache name : for readonly case it can be easily set to . as the parameters will be used by the key so there cannot be any overlap due to overloads. But for read/write cache this is more complex as one method will read and another write. In this case I think that asking the user to give a proper name is acceptable.

Not sure exactly what you mean with that. You mean . as cache name? In any case, invalidation is likely still required in read-only cases. You'd want to potentially invalidate the data explicitly. That's at least the reading JPA does on read-only entities. They can be loaded into the cache and evict/invalidated.

One use case I can think of is for a method that will have both @CacheInvalidate and @CacheResult in order to replace a cache item inside the cache. If this method takes long time to compute and the invalidation happens before but the method didn't success, the cache will be empty and the long computation will happens again ...

Yeah, but if we don't apply the invalidation, the method would not be invoked. CacheResult would only execute the method if the entry does not exist in the cache. If invalidation does not happen, the cached value would be returned and the method would not be invoked.

Also, the cache being empty and computation happening again is fine. It's a cache :). As @emmanuelbernard said, it's better to over invalidate than under invalidate.

Having said all of that, I'd be very interested to hear use cases of method implementation for CacheInvalidate alone (no CacheResult) where it'd make sense to do the invalidation after and not before. Right now I cannot think of one...

galderz on 13 Dec 2019

Default cache name => Classname.methodName for read only.

Usually usage of invalidation without reloading the data is the case of flushing a cache region (or all of them).
In a lot of e-commerce site, plublication of data is done on regular basis (each hour, each day) to avoid too much updates of the site and having good HTTP cache ratio (HTML pages are usually cached). So they "publish" the new site and flush the cache. But I already advertise to offers a programmative API that I think will be more usable for this ...

loicmathieu on 13 Dec 2019

In your e-commerce scenario, I think classname.method name would be a confusing default cache name actually for the method implementing invalidation programmatically. That's where being explicit just clarify things for everyone.

emmanuelbernard on 13 Dec 2019

@emmanuelbernard When I say programmatively I mean without annotations, like CacheManager.flush("regionName") or CacheManager.flushAll()

loicmathieu on 13 Dec 2019

Default cache name => Classname.methodName for read only.

It certainly depends on the use case one has in mind, but I'd be careful as it might be a totally surprising default in some other use cases.

Say for example I have a "StuffManager" DAO-like class, I'd want my STUFF_CACHE to be shared for all usages of the STUFF type across all methods it has - having a different cache for each method would be inefficient, consume more memory and - when invalidating only one of these could introduce bugs of the worst kind, as they are hard to spot and the framework can't warn about "suspicious activity"

Sanne on 13 Dec 2019

FYI I've created https://github.com/quarkusio/quarkus/issues/6848 because I believe that the cache name should be defaulted.

mkouba on 29 Jan 2020

Now that the first version of the quarkus-cache extension is merged, we could start discussing about the programmatic caching API for advanced use cases. There's already been an advanced use case question in #6903 and maybe the programmatic API could be the answer to that question.

If there's a consensus about the operations/features the API should support, I could start working on it. But first, should we continue the discussion here (there's a huge history and everyone gets automatically notified) or start a new issue about the programmatic API?

gwenneg on 7 Feb 2020

@gwenneg my proposal is, @karesti and you can start something and exchange. When you feel you ahve more questions than answers, send an email to quarkus-dev@ with proposals and code sample (usage). Or we can go and do a dedicated issue but I tend ot like emails a bit better.

The main principle is again the MVP for a programmatic API as well as thinking imperative and reactive (which means looking at Mutiny).

emmanuelbernard on 7 Feb 2020

Oh BTW I agree, do the 80% use cases well and accept it won't do everything makes sense for the Quarkus usage. We know people will be able to fall back to full Caffeine, Infinispan, etc APIs when they need more.

Still, would be great if underlying caching technology was pluggable. Are there any plans to implement it in near future?

Looks like some groundwork was already done as Caffeine is just one of _many_: https://github.com/quarkusio/quarkus/blob/b7d8ef57d5b2b1b64191dfc6c50e3c50cb89942b/extensions/cache/deployment/src/main/java/io/quarkus/cache/deployment/CacheProcessor.java#L81

pivovarit on 10 Feb 2020

Having multiple caching providers was part of the initial plan 7 months ago (see here for more details). That's why CacheDeploymentConstants.CAFFEINE_CACHE_TYPE is still around. I don't know if this possibility will be considered in near future though.

gwenneg on 10 Feb 2020

@gwenneg @karesti I am working on the roadmap for the next 3-6 months. I would think this issue should be closed and rather have specific epics for the next few months (e.g. the API work). Thoughts?

emmanuelbernard on 24 Mar 2020

I agree @emmanuelbernard.

I've been working (at a slower pace than initially planned) on a PR on my fork about the caching API. I need to finish a few details about the async aspect of the API (I'm using Mutiny) and I'll submit a WIP PR pretty soon.

gwenneg on 24 Mar 2020

@gwenneg is there a issue to track Cache API?

emmanuelbernard on 24 Mar 2020

Closing and a Cache API issue will follow up.

emmanuelbernard on 24 Mar 2020

@gwenneg is there a issue to track Cache API?

Not yet, there's only a discussion in the mailing list about the API.

gwenneg on 24 Mar 2020

OK, I've created https://github.com/quarkusio/quarkus/issues/8140 for it

emmanuelbernard on 25 Mar 2020

👍1

Quarkus: Annotation-driven application data caching

All 80 comments

Annotation names

Locking:

Remove `@CacheStore`?

Async method support

Default Cache Name

Key matching

Cache invocation specification

Plan

Related issues

Quarkus: Annotation-driven application data caching

All 80 comments

Annotation names

Locking:

Remove @CacheStore?

Async method support

Default Cache Name

Key matching

Cache invocation specification

Plan

Related issues

Remove `@CacheStore`?