Our parameter-based query cache currently operates on parameter nullability; this is a mechanism we had in previous versions which allows us to vary SQL based on actual parameter value nullability and is potentially important for perf (#17543). This issue proposes a more general design that could unlock parameter-based caching for other scenarios - not just null checks on parameters.
listParam.Contains(x), we expand the list parameter to a constant in the SQL, resulting in x IN (constant1, constant2...), since there are no list parameters. Instead, we could expand to a parameterized list: x IN (@p1, @p2), based on the number of elements in the given list (i.e. we'd cache one version for 1 param, another for 2 params, etc.). This would allow us to avoid the additional execution-time visitation, and utilise database query plans in a much better way (we need to think about cache explosion and eviction though). This is tracked by #13617, but this feature could be a building block.LEFT(LEN()) expression as we do now (although this would require exposure to method translators). We could even rewrite the search pattern to escape the wildcard characters.@p ? A : B)x = ANY (@listParam). However, depending on whether the parameter contains any nulls, an additional null check needs to be added (OR x IS NULL). We could cache two command versions - one for when the list contains null, and another when it doesn't.For the following, let's concentrate on reimplementing the existing nullability-based caching.
boolParam ? x : y we would elide the condition entirely in SQL. See also #19168.Assumption/limitation:
The above takes care of nullability-based caching, where the visitor(s) involved run after the parameter-based cache has been consulted. However, we also want to vary SQL for parts of the tree that are handled before the parameter-based cache is consulted, and therefore cannot yet access the QueryContext. For example, we want to emit different SQLs for StartsWith, which is translated early. To support this, we can introduce a new expression type, QueryContextDependentExpression, which can be "resolved" into another, concrete expression, later in the pipeline when the QueryContext is known.
While not technically part of this proposal, this feature is important to support the scenarios above. Ideally, we'd have a way to perform arbitrary parameter transformations before sending out parameters. Scenarios for this include:
We can discuss later whether a new parameter would be added or the existing value would be mutated.
First one is filed #13617
Oh nice, I wasn't aware of that issue (and didn't know so many people have run into it). This issue could provide a general way to deal with that.
@smitpatel @AndriySvyryd @ajcvickers updated the above based on our recent discussions, hope it's more or less accurate.
Few things
At very first, we should look into the cache design and delegates based on parameter nullability so that we can avoid having parameter nullability affect cache when parameter is not used. Everything else can be low priority.
We cannot use QCC. RelationalCommandCaching is relational concept and it cannot have anything on QCC.
The idea of caching something based on parameter values isn't necessarily specific to relational - especially as all we'd be doing is registering bool-returning lambdas there. But of course if we have another context (or possibly RelationalQCC?) that works too.
At very first, we should look into the cache design and delegates based on parameter nullability so that we can avoid having parameter nullability affect cache when parameter is not used. Everything else can be low priority.
@smitpatel sure. I think that means doing part 1 (Delegate-based parameter cache lookup) and leaving QueryContextDependentExpression and client-side parameter transformation for later?
The idea of caching something based on parameter values isn't necessarily specific to relational
The idea of caching anything after first level cache is not necessarily norm. The compiled query delegate should remain regardless of parameter values. But relational does not work that way since it generates different SQL command. Let's not pollute core for relational concepts. InMemory certainly does not work that way. Same could be true for Cosmos or other non-relational providers.
OK. In that case we could extend and have a RelationalQCC, or possibly just have the relevant visitor(s) expose these lambdas on themselves via a property, and have the calling code collect them.
While it is implementation detail, we don't relationalQCC either. You are going to have a multiplexer to run the caching, just make it part of it. Or as @AndriySvyryd suggested, each visitor can compose over the cache key so they don't have to register anything outside.
With the limitation that the lambda can only return a bool I don't see how this can be used to fix #13617 as there the cache key would need to contain the length of the list parameters.
Here's a counter-proposal that @smitpatel referred to:
ParameterBasedCacheKeyGenerator is instantiated for every query and stored in the RelationalCommandCache.ParameterBasedCacheKeyGenerator is made available to them to be used to store a lambda that takes the QueryContext and an object (a different cache key) and returns a new cache key that contains the previous key and any additional information.ParameterBasedCacheKeyGenerator would have a method CheckNullability(int) that would register the parameter at the given index to have its nullability as part of the cache key.For example the default cache key would be
```C#
class RelationalCommandCacheKey
{
SelectExpression _selectExpression;
bool _useRelationalNulls;
BitArray _paramNullability;
}
With `_paramNullability` being null/empty unless `CheckNullability` is called.
The visitor handling `listParam.Contains(x)` would create a lambda that when run would produce something like this for every list parameter in the query:
```C#
class RelationalListCacheKey
{
object _relationalCommandCacheKey;
int _listLength;
}
Using value types for cache keys is not beneficial as they are always boxed in IMemoryCache
With the limitation that the lambda can only return a bool I don't see how this can be used to fix #13617 as there the cache key would need to contain the length of the list parameters.
That's true. If we want to support non-boolean scenario, the cache key can also be a simple byte[] where we append outputs from lambdas. A visitor can register a bool-returning lambda, or an int-returning lambda (or anything really), and their values would be packed in order into a single "buffer" (see below for a sketch of an API shape).
As relational visitors are invoked ParameterBasedCacheKeyGenerator is made available to them to be used to store a lambda that takes the QueryContext and an object (a different cache key) and returns a new cache key that contains the previous key and any additional information.
The first part of that makes sense to me as a mechanism for registering the lambdas, on the first time a query is being processed (when we're building the ParameterBasedCacheKeyGenerator). But I'm not understanding the second part: how would we match a query whose parameter values are already in our parameter-based cache? We need to have a single cache key generator somewhere that can be executed on the QueryContext once we reach the parameter-based cache, and which yields a single cache key that we can then look up in the IMemoryCache...
In my mental model, things could look something like this:
```c#
class ParameterBasedCacheKeyGeneratorBuilder
{
// Registration methods for use while building the generator, when first compiling the query variant
public void RegisterLambda(Func
public void RegisterLambda(Func
// Possibly registration for other lambda types if we think we need them
// At the end of compilation we call this to build the generator
public Func<QueryContext, byte[]> Build() {
// Based on all registers lambda, returns a function that calculates the cache key for a given QueryContext.
// For extra perf we could accept lambda expressions, combine them and compile out a single delegate here.
}
}
```
For each query we just execute the lambda returned by Build to get the cache key (a byte[]), which we can then use as the key for RelationalCommandCache.
SelectExpression _selectExpression;
It may be better to simply scope RelationalCommandCache to a given query, instead of making it global (so instead of a single static dictionary we'd have a dictionary per QueryingEnumerable). This would make the dictionaries smaller, and we wouldn't have to go through useless comparisons with cache keys of other queries. The downside is managing eviction/cache explosion per-query instead of globally.
bool _useRelationalNulls;
We could simply include this kind of info inside the cache key, just like parameters on the QueryContext (i.e. as another value packed into the byte[]).
Using value types for cache keys is not beneficial as they are always boxed in IMemoryCache
Yeah, that seems unfortunate. One allocation for the cache key is probably OK, although we could look at high-perf alternatives (I have some ideas, but we probably shouldn't look at this for now).
I am still concerned that by completely decoupling the visitors might result in duplication for nullability, but we can let benchmarks dictate whether this is something that needs to be optimized out.
... the cache key can also be a simple byte[]
We still need to wrap it to override Equals and GetHashCode.
For extra perf we could accept lambda expressions, combine them and compile out a single delegate here.
Agree, the lambda expressions registered by the visitors can also just take the IReadOnlyDictionary<string, object> of the parameter values.
It may be better to simply scope RelationalCommandCache to a given query...
We had already decided that we would use a single global cache to allow a single global memory limit. But we can revisit this, though we should still always have a hard limit on the 2nd level cache.
... wouldn't have to go through useless comparisons with cache keys of other queries ...
As long as we produce reasonably distributed hash codes this shouldn't be an issue in practice
I am still concerned that by completely decoupling the visitors might result in duplication for nullability, but we can let benchmarks dictate whether this is something that needs to be optimized out.
You mean because multiple providers would register lambdas for the same thing (i.e. the nullability of a parameter)? Hopefully the visitors and pipeline can be designed so that this doesn't happen... But we can cross that bridge when we get to it.
... the cache key can also be a simple byte[]
We still need to wrap it to override Equals and GetHashCode.
True.
Agree, the lambda expressions registered by the visitors can also just take the IReadOnlyDictionary
of the parameter values.
Yeah, it's possible - although we may want to involve non-parameter values in the cache. One possibility is the relational null setting, there may be others...
It may be better to simply scope RelationalCommandCache to a given query...
We had already decided that we would use a single global cache to allow a single global memory limit. But we can revisit this, though we should still always have a hard limit on the 2nd level cache.
Yeah, that's the downside. If we really want to, we could have a hard limit without necessarily having a global cache, but that would be a custom implementation and not MemoryCache. That's just an idea, to start with we can definitely keep the current system instead and profile/benchmark afterwards.
I'm facing a problem with Oracle that I believe could benefit from this. The basic use case is an indexed prefix search (StartsWith or EF.Functions.Like) on a million row table.
WHERE column LIKE 'John%' -- uses index
WHERE column LIKE :p -- does not use index, to my astonishment (bind variable peeking should probably take care of it)
WHERE INSTR(column, :p) = 1 -- does not use index
WHERE INSTR(column, 'John') = 1 -- does not use index
WHERE SUBSTR(column, 1, LENGTH(:p)) = :p -- does not use index
WHERE SUBSTR(column, 1, LENGTH('John')) = 'John' -- does not use index
What I've been doing so far (in EF6, it seems possible in EFC3 too) is creating a custom scalar function with an unguessable name, wrap the parameter like so during query compilation:
WHERE column LIKE InlineMe187afbc3176bcb3556(:p)
then a command interceptor inlines the actual parameter value just before the command executes. The actual query expression would be the same regardless of the parameters but the marked parameter values should always be inlined in the command text.
This is very far from ideal and probably should be handled in the rdbms (slim chance in the foreseeable future) but for the time being I'm stuck with this hack. And the occassional devs fleeing in terror when they come across the code. The statement cache trashing has much less perf impact than a full-table scan.
@bachratyg - I think this issue is bit more advanced that the particular usecase. This issue tracks a way to actually cache the relational command for a set of values for given parameter. In your case, you actually want to inline the parameter value always, which means the set of values the parameter can take is not finite and this caching should not be used for that. Instead you should use non-cached version of RelationalCommand. When InExpression.Values are not null, we do that kind of processing. In EF Core 5.0, you should be able to inject your custom visitor in RelationalParameterBasedSqlProcessor which would basically inline the values whenever LikeExpression has a parameter specified and mark the generated command to be not cached.
@bachratyg don't hesitate to ping us if you need some guidance. The constantization of InExpression.Values which @smitpatel mentioned above can be found here, and don't forget to disable caching (as that code does) by calling DoNotCache.
Thanks for pointing me in the right direction. Indeed this is what I was looking for. No simple way to bypass the command cache was the main problem I was trying to work around when coming up with this solution. This will greatly simplify the code when we finally make the move to 5+.
Most helpful comment
@bachratyg - I think this issue is bit more advanced that the particular usecase. This issue tracks a way to actually cache the relational command for a set of values for given parameter. In your case, you actually want to inline the parameter value always, which means the set of values the parameter can take is not finite and this caching should not be used for that. Instead you should use non-cached version of RelationalCommand. When
InExpression.Valuesare not null, we do that kind of processing. In EF Core 5.0, you should be able to inject your custom visitor inRelationalParameterBasedSqlProcessorwhich would basically inline the values whenever LikeExpression has a parameter specified and mark the generated command to be not cached.