Runtime: GetOrCreate is not atomic which is a serious issue for a cache ( eg poor perf and memory issues)

Created on 14 Aug 2018  Â·  24Comments  Â·  Source: dotnet/runtime

This blog provides the detail

https://tpodolak.com/blog/2017/12/13/asp-net-core-memorycache-getorcreate-calls-factory-method-multiple-times/

We see poor performance in 1 in 10 queries due to multiple invokes when not needed even under non high load conditions and have had to resort to a similar solution to that blog post which is not great .

This was closed with no good reason https://github.com/aspnet/Caching/issues/359

area-Extensions-Caching tenet-performance

Most helpful comment

Its especially an issue with newer devs - the general expectation of how it works is not met - which results in investigations , tickets and reduces peoples opinion of Core . Caching is pretty important .

Many people have this issue now without even knowing about it.

All 24 comments

There are two distinctive concerns in the issue you are referencing:

  • The result that is added could be a different one which was returned
  • The lambda is called multiple times concurrently

Based on the blog post you wrote it's clear that you disagree with the fact that the lambda is not blocked from reentrancy "by default". But this behavior is by design (as you also mentioned), as it is for the ConcurrentDictionary.

The idea is that each scenario is different and some might not require a lock. You are free to lock the lambda if you think it's not a cheap operation and even two calls are worst. After all if the default implementation had to lock it would do it quite aggressively on the key, however you might want to minimize the locking by providing a better implementation, which you are showing adequately in your blog post.

And for this reason I can only expect this issue will be closed too.

@sebastienros I'm here for the 2nd concern you mentioned, i.e. block re-entrance. Although I agree what you said that each scenario is different, that lock-free might be possible. But in my opinion a default implementation that prevents re-entry would be hugely useful for us lay persons, since, given we are "educated" enough to seek the help from a cache, therefore populating the cache must be something high cost, therefore by default to allow re-entry sounds contradictory.

@sebastienros I'd argue that it doesn't make sense to have a lock-less cache-miss scenario. Let's consider a common scenario where the cache is used during retrieval of something which comes from an external source, I.e. a web request. This may or may not be deterministic, it doesn't really matter either way for this discussion. In this scenario, the caller for a given key has to wait for the web request and parallel web requests does not improve latency. (In-fact they will reduce performance due to the extra resources required)

The typical solution to this is the double-checked lock that was mentioned in the article tha @bklooste linked to. In this implementation a cache hit is typically lock-free and locks are only used during a refresh of the stored item for that key.

A better solution to the one in the article would be to use an "async/awaitable keyed lock", but this needs to be bounded so that you don't leak memory over time. This has the advantage of improving the scope of the locking to just the keys that are being refreshed, but the disadvantage that you effectively need a second level of cache (This time with an "async/awaitable lock" which operates globally across all of the keys) for the sole purpose of managing the keyed locks.

I'll demonstrate with some pseudo code: (Assume it's all async)

Item GetOrCreate(key, factory)
{
    if (memoryCache.TryGet(key, out item))
    {
        return item
    }

    using (await GetKeyedLock(key).Lock())
    {
        if (memoryCache.TryGet(key, out item))
        {
            return item
        }

        item = await factory(key)
        memoryCache.Set(key, item)

        return item;
    }
}

Lock GetKeyedLock(key)
{
    key = key + "¦lock";
    if (memoryCache.TryGet(key, out lock))
    {
        return lock
    }

    using (await Lock())
    {
        if (memoryCache.TryGet(key, out lock))
        {
            return lock
        }

        lock = new Lock()
        memoryCache.Set(key, lock)

        return lock;
    }
}

I agree that a method called GetOrCreate sounds like it will work atomically. I also took 'Threadsafe' in the documentation as a confirmation that it was atomic. Otherwise what's the point in the method? If its not atomic I might as well make multiple Get / Create calls and know exactly what I have to deal with regarding locking. You really need to make this more clear in the documentation of GetOrCreate as it makes a _fundamental_ difference in whether the Create Lambda is the actual expensive operation, or a cheap Lazy constructor whose .Value lambda is the expensive operation.

I came across this issue as well and agree with @mbirtwistle the current state as it stands require manually placing locks to guarantee consistent output. I have created a simple demo that runs 100K concurrent requests on MemoryCache.GetOrCreate and ConcurrentDictionary.GetOrCreate using Lazy factory with side-by-side comparison that shows GetOrCreate is neither re-entrant nor protect its internal state, i.e. the payload of a never expiring cache is updated by making concurrent calls to GetOrCreate.

memorycache-vs-concurrentdictionary

The demo shows that GetOrCreate returns different instances of Lazy that yields different results, which somewhat defeats the purpose of moving the expensive operation to a Lazy factory. Please correct me if you see any mistakes in the way I constructed the demo, otherwise I hope the aspnet team would look at this and hopefully address this in the next release please.

Note: the demo code is extended from reading the related issue aspnet/Caching#359 by @amit-ezra

Edit: mentioning specs
.NET Core 2.2
Microsoft.Extensions.Caching.Memory 2.2.0.0

Its especially an issue with newer devs - the general expectation of how it works is not met - which results in investigations , tickets and reduces peoples opinion of Core . Caching is pretty important .

Many people have this issue now without even knowing about it.

+1

These are the extensions I am currently using to ensure that get-or-add and set operations are atomic.
I tested it with a modified version of the code from @justinzhong and it seems to behave as expected.
Synchronization is kept to a minimum, in case of a cache miss a global semaphores concurrent dictionary is accessed and a semaphore is acquired on a per-memorycache-key basis.

``` C#
namespace Extensions
{
using System;
using System.Collections.Concurrent;
using System.Threading;
using System.Threading.Tasks;
using Microsoft.Extensions.Caching.Memory;

public static class MemoryCacheExtensions
{
    private static readonly ConcurrentDictionary<int, SemaphoreSlim> _semaphores = new ConcurrentDictionary<int, SemaphoreSlim>();

    public static async Task<(T Value, bool Created)> GetOrCreateAtomicAsync<T>(this IMemoryCache memoryCache, object key, Func<ICacheEntry, T> factory)
    {
        if (memoryCache.TryGetValue(key, out var value))
            return ((T)value, false);

        var isOwner = false;
        var semaphoreKey = (memoryCache, key).GetHashCode();
        if (!_semaphores.TryGetValue(semaphoreKey, out var semaphore))
        {
            SemaphoreSlim createdSemaphore = null;
            semaphore = _semaphores.GetOrAdd(semaphoreKey, k => createdSemaphore = new SemaphoreSlim(1)); // Try to add the value, this is not atomic, so multiple semaphores could be created, but just one will be stored!

            if (createdSemaphore != semaphore)
                createdSemaphore.Dispose(); // This semaphore was not the one that made it into the dictionary, will not be used!
            else
                isOwner = true;
        }

        await semaphore.WaitAsync()
                       .ConfigureAwait(false); // Await the semaphore!
        try
        {
            if (!memoryCache.TryGetValue(key, out value))
            {
                var entry = memoryCache.CreateEntry(key);
                entry.SetValue(value = factory(entry));
                entry.Dispose();
                return ((T)value, true);
            }

            return ((T)value, false);
        }
        finally
        {
            if (isOwner)
                _semaphores.TryRemove(semaphoreKey, out _);
            semaphore.Release();
        }
    }

    public static async Task<T> SetAtomicAsync<T>(this IMemoryCache memoryCache, object key, Func<ICacheEntry, T, T> factory)
    {
        var isOwner = false;
        var semaphoreKey = (memoryCache, key).GetHashCode();
        if (!_semaphores.TryGetValue(semaphoreKey, out var semaphore))
        {
            SemaphoreSlim createdSemaphore = null;
            semaphore = _semaphores.GetOrAdd(semaphoreKey, k => createdSemaphore = new SemaphoreSlim(1)); // Try to add the value, this is not atomic, so multiple semaphores could be created, but just one will be stored!

            if (createdSemaphore != semaphore)
                createdSemaphore?.Dispose(); // This semaphore was not the one that made it into the dictionary, will not be used!
            else
                isOwner = true;
        }

        await semaphore.WaitAsync()
                       .ConfigureAwait(false); // Await the semaphore!
        try
        {
            var currentValue = default(T);
            if (memoryCache.TryGetValue(key, out var v))
                currentValue = (T)v;

            T newValue;
            var entry = memoryCache.CreateEntry(key);
            entry.SetValue(newValue = factory(entry, currentValue));
            entry.Dispose();

            return newValue;
        }
        finally
        {
            if (isOwner)
                _semaphores.TryRemove(semaphoreKey, out _);
            semaphore.Release();
        }
    }
}

}
```

@BladeWise I think you can simplify slightly as although GetOrAdd is not thread safe, it does return a consistent result at least according to this blog post,

So I think

SemaphoreSlim createdSemaphore = null;
locks.GetOrAdd(key, k => createdSemaphore = new SemaphoreSlim(1)); // Try to add the value, this is not atomic, so multiple semaphores could be created, but just one will be stored!
semaphore = locks[key]; // Re-read the value from the dictionary, to ensure the used value is exactly the value stored in the dictionary!

can reduce to

SemaphoreSlim createdSemaphore = null;
semaphore = locks.GetOrAdd(key, k => createdSemaphore = new SemaphoreSlim(1)); // Try to add the value, this is not atomic, so multiple semaphores could be created, but just one will be stored!

You could go the whole hog and use a Lazy to create it which would ensure only one semaphore was ever created.

@phatcher You are correct, moreover
c# semaphore = locks[key];
introduces a race condition, causing a key not found exception in case the key is created and removed by another thread, while the current thread performs the double check. So, just the GetOrAdd should be used as you suggested.

The result that is added could be a different one which was returned

@sebastienros Are you sure that the returned value can differ from the value that ends up being added to the cache? https://github.com/aspnet/Caching/issues/359#issuecomment-375995935 seems to contradict this:

and this would be the same with MemoryCache.AddOrGetExisting from .net framework. and by same I mean you would get the same value from all threads (even if lambda was called multiple times)

If the latter is true, then the behavior matches ConcurrentDictionary and could be deemed correct and expected.

In that case, we could easily wrap the values in a Lazy with the right LazyThreadSafetyMode if we _truly_ want to prevent ever creating multiple values. And let's face it: creating multiple values simultaneously, with one winner, is generally acceptable. Either we lock and we wait for the initial creator to finish creating, or we perform the same operation simultaneously and end up using their value. Generally not worth fussing over unless the creation is excessively expensive or has side effects.

👋 @anurse - G'Day! Any chance this issue could get some prioritisation for the next 3.x iteration please? It's catching a number of people out, whom I've been working with over the last few months after we stumbled across this issue 6 odd months ago.

I guess I'm interested in why this is catching people out when it's generally the common pattern for lazy multi-threaded initialization like this. ConcurrentDictionary.GetOrdAdd has the same approach.

Locking around user-provided delegates is generally a bad approach as it takes control away from the user and makes it easy to unintentionally deadlock. I suppose my initial observation is this: Is it better to unintentionally run the delegate multiple times (because the consumer didn't realize the delegate would be called multiple times) or better to unintentionally deadlock (because the consumer didn't realize they were under a lock and tried to re-access the cache). The cache is still atomic with regard to applying the result of the delegate to the cache. The user is always in a position to lock within their delegate to prevent multiple invocations of expensive code paths (or use something like Lazy<T> to do the locking for them).

Regardless of the outcome of that discussion, I don't expect a change would come in the 3.x milestone. This would be a significant behavioral change which would not be done in a 3.0.x patch. And the 3.1 release is pretty much fully booked at this point (and the bar for changes is quite high given the very short timescale, there's little chance for feedback). The earliest I'd expect a change here is 5.0.

It's catching a number of people out

@PureKrome In https://github.com/aspnet/Extensions/issues/708#issuecomment-494393036 I mentioned why the current behavior (if indeed it matches ConcurrentDictionary) seems expected and usual to me.

Do you think this should behave differently from that? If so, what makes this case different, and what makes the alternative implementation more robust or more intuitive than the current one?

@Timovzl this whole page , the blog etc is about how its unusual and how it should behave , eg GetOrCreate being atomic. There are no disagreements on this and no its NOT like ConcurrentDictionary as that call is atomic and part of concurrrent dictionary not via 2 calls done by an extension method. It has the same api signature which is good but the behavior is different .

The only reason I can see it hasn't been done in the first place is because to do it means changing IMemoryCache which is breaking or have IMemoryCache2 : IMemoryCache which is ugly / require
new tests and means people dont want to fix it but 3 would be the right time for this maybe have IAtomicMemoryCache

@anurse thanks heaps for the prompt reply and for popping back into this thread - really appreciate it!

I guess I'm interested in why this is catching people out when it's generally the common pattern for lazy multi-threaded initialization like this. ConcurrentDictionary.GetOrdAdd has the same approach.

Great question and I think this is the crux of the situation: it's the knowledge about how to use all of this, correctly. When "we" (friends, colleagues, etc) first all start out using this, we all incorrectly thought it was handling the delegate 'atomically' (is that the correct term, here?).

So for me, I'm not sure if this could be handled better with code or just documentation. Maybe the documentation is sufficient and I've not found it -or- it's been updated recently'ish and I've just not noticed it because I sorta know how to solve this problem, now.

BTW, the solutions you did post are what i've found out too. But it's easy to know what to do, when you've been given the answer. For me, this post by Andrew Lock really was the lightbulb moment for me to teach me what to do (Hint: it's to use Lazy<T>, just like you said). But this so wasn't obvious to me and others, for ages.

Maybe the documentation is sufficient and I've not found it

To me this indicates the documentation isn't sufficient :). If it's not in a place that you can easily find it, there's a gap to be filled. The API reference docs are certainly lacking here.

So certainly some action to take here is to improve those docs.

Sweet! Also,

Maybe the documentation is sufficient and I've not found it

that's a bad typo on my behalf. I ment to say: "Maybe the documentation isn't sufficient and I've not found it ... "

Now I'll just chill until the awesome Docs team get some bandwidth to review this, then :)

EDIT: @ Docs team - would be lovely to have a number of examples/scenario's handled in the docs, if possible :) e.g. typical scenario with loading some database resource and caching it locally in memory forever (until app restarts) or in memory for a fixed amount of time (e.g. 1 hour) with invalidation, etc.

My request way back in this thread was for the documentation to make absolutely clear that the initialiser will likely get called multiple times in parallel if your app is processing multiple requests during cache warm up. Once you know this, There is a simple workaround to make the cache return a Lazy with your initialiser as the Lazy delegate so the Lazy can implement the locking that ensures a single delegate callback occurs. A lazy example would be useful. It’s up to the cache user to decide whether to return a T or a Lazy from the cache. They just need to know the facts about the MemoryCache’s GetOrCreate approach to locking to make an informed decision.

Get Outlook for iOShttps://aka.ms/o0ukef


From: Ben Kloosterman notifications@github.com
Sent: Thursday, October 3, 2019 1:01:22 AM
To: aspnet/Extensions Extensions@noreply.github.com
Cc: Michael Birtwistle mike@birty.org; Mention mention@noreply.github.com
Subject: Re: [aspnet/Extensions] GetOrCreate is not atomic which is a serious issue for a cache ( eg poor perf and memory issues) (#708)

@Timovzlhttps://github.com/Timovzl this whole page , the blog etc is about how its unusual and how it should behave , eg GetOrCreate being atomic. There are no disagreements on this and no its NOT like ConcurrentDictionary as that call is atomic and part of concurrrent dictionary not via 2 calls done by an extension method. It has the same api signature which is good but the behavior is different .

The only reason I can see it hasn't been done in the first place is because to do it means changing IMemoryCache which is breaking or have IMemoryCache2 : IMemoryCache which is ugly / require
new tests and means people dont want to fix it but 3 would be the right time for this maybe have IAtomicMemoryCache

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHubhttps://github.com/aspnet/Extensions/issues/708?email_source=notifications&email_token=ABQHYOKK3GLROAL5L36IU5TQMUY5FA5CNFSM4GKI6GT2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAGRTQA#issuecomment-537729472, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ABQHYOMHFIB35EZAVONJ7QTQMUY5FANCNFSM4GKI6GTQ.

The work around is neither easy or obvious.. I bet like 10-100K devs are hit without knowing and it will take a long time of why do you have this "useless" Lazy or lock delegate there to spread around and then when it is fixed or a new memory cache comes along it needs to be reverted.

Documentation and a few official blogs can certainly help but its a pretty big workaround / smell IMHO.,

My request way back in this thread was for the documentation to make absolutely clear that the initialiser will likely get called multiple times in parallel if your app is processing multiple requests during cache warm up. Once you know this, There is a simple workaround to make the cache return a Lazy with your initialiser as the Lazy delegate so the Lazy can implement the locking that ensures a single delegate callback occurs.

I concur. Lazy<T> is the go-to mechanism for these things, and a way to avoid duplicating thread-safety mechanisms all over the place. On top, we get support for both the usual case (multiple initial factory invocations) and the strict case (one and only one factory invocation).

The documentation should definitely mention the behavior, and for those unfamiliar with using Lazy<T> to guarantee one-time creation, that could even be added as a recommendation.

Lazy is not a "goto" for thread safety/ locking it is only for where you actual want to lazy load and in this case you don't either, its sort of a hack / extra level of indirection but it will work.

The question is if cache needs this then IT should to this behind the scenes. However as i said above the poor design decision now leaves a choice of convenience so its ok but not good.

IMemoryCache>> gets pretty unwieldy .. and i know many people would criticize this as overly complex / confusing.

Thats said its probably the best choice provided its documented / blogged unless the team wants to change IMemoryCache for another reason if so than the atomic op should be added and then we dont need Lazy.

I couldn't figure out the best area label to add to this issue. Please help me learn by adding exactly one area label.

Another work around that implements the use of Lazy, and the locking required to stop a different version ending up in the cache is to use LazyCache, a library that is available on nuget I wrote to solve these issues.

It wraps MemoryCache with the additional stuff required to make the lambda atomic as requested in this issue. It is also likely to perform better than the nice code provided by @BladeWise because of some smarts in the way the locks are collated by cache key and do not use SemaphoreSlim. Hopefully it helps some people who might stumble on this issue when their cache is not working as expected.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

nalywa picture nalywa  Â·  3Comments

jamesqo picture jamesqo  Â·  3Comments

bencz picture bencz  Â·  3Comments

GitAntoinee picture GitAntoinee  Â·  3Comments

btecu picture btecu  Â·  3Comments