Runtime: ConcurrentDictionary costs much more performance than Dictionary

Created on 7 Dec 2019 · 13Comments · Source: dotnet/runtime

I aspect some(!) overload when using ConcurrentDictionary over Dictionary due to its thread-safety but these simple tests are way beyond anything I've expected. Could it be the ConcurrentDictionary has missed some performance improvements or what makes this difference so huge? Not only the CPU performance but also memory performance is heavily bad => +87% CPU time & +80% memory usage. Other concurrent collection types are more or less acceptable, see edit below.

BenchmarkDotNet=v0.12.0, OS=Windows 10.0.18363
Intel Core i7-4960X CPU 3.60GHz (Haswell), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=3.1.100
  [Host]     : .NET Core 3.1.0 (CoreCLR 4.700.19.56402, CoreFX 4.700.19.56404), X64 RyuJIT
  Job-BIJLNP : .NET Core 3.1.0 (CoreCLR 4.700.19.56402, CoreFX 4.700.19.56404), X64 RyuJIT

Runtime=.NET Core 3.1  InvocationCount=1  UnrollFactor=1

| Method | Mean | Error | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
|--------------------- |----------:|---------:|---------:|-----------:|----------:|----------:|----------:|
| List | 31.42 ms | 0.165 ms | 0.146 ms | 2000.0000 | 1000.0000 | - | 18.31 MB |
| ObservableCollection | 151.91 ms | 2.948 ms | 2.613 ms | 13000.0000 | 5000.0000 | 2000.0000 | 83.16 MB |
| Dictionary | 71.15 ms | 0.541 ms | 0.479 ms | 2000.0000 | 1000.0000 | - | 18.31 MB |
| ConcurrentDictionary | 566.53 ms | 3.922 ms | 3.476 ms | 15000.0000 | 9000.0000 | 3000.0000 | 110.23 MB |

Benchmark

public struct Entry
{
    public int Value;
    public string Name;
}

[CoreJob]
[MemoryDiagnoser]
[RPlotExporter]
public class ObjectTest
{
    private List<Entry> ListCollection = new List<Entry>();
    private ObservableCollection<Entry> ObsCollection = new ObservableCollection<Entry>();
    private Dictionary<string, Entry> DictionaryCollection = new Dictionary<string, Entry>();
    private ConcurrentDictionary<string, Entry> ConcurentDictionaryCollection = new ConcurrentDictionary<string, Entry>();


    private const int counter = 500000;


    [IterationCleanup]
    public void Cleanup()
    {
        ListCollection.Clear();
        ObsCollection.Clear();
        DictionaryCollection.Clear();
        ConcurentDictionaryCollection.Clear();
    }

    [Benchmark]
    public bool List()
    {
        for (int i = 0; i < counter; i++)
        {
            var entry = new Entry { Name = i.ToString(), Value = i };
            ListCollection.Add(entry);
        }
        return true;
    }
    [Benchmark]
    public bool ObservableCollection()
    {
        for (int i = 0; i < counter; i++)
        {
            var entry = new Entry { Name = i.ToString(), Value = i };
            ObsCollection.Add(entry);
        }
        return true;
    }

    [Benchmark]
    public bool Dictionary()
    {
        for (int i = 0; i < counter; i++)
        {
            var entry = new Entry { Name = i.ToString(), Value = i };
            DictionaryCollection.Add(entry.Name, entry);
        }
        return true;
    }

    [Benchmark]
    public bool ConcurrentDictionary()
    {
        for (int i = 0; i < counter; i++)
        {
            var entry = new Entry { Name = i.ToString(), Value = i };
            ConcurentDictionaryCollection.TryAdd(entry.Name, entry);
        }
        return true;
    }
}

Edit: Other Concurrent Collection Results

| Method | Mean | Error | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
|----------- |---------:|---------:|---------:|----------:|----------:|------:|----------:|
| List | 31.56 ms | 0.169 ms | 0.150 ms | 2000.0000 | 1000.0000 | - | 18.31 MB |
| ConcurrentBag | 44.42 ms | 0.130 ms | 0.121 ms | 2000.0000 | 1000.0000 | - | 18.31 MB |

| Method | Mean | Error | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
|----------- |---------:|---------:|---------:|----------:|----------:|----------:|----------:|
| Queue | 32.89 ms | 0.368 ms | 0.326 ms | 2000.0000 | 1000.0000 | - | 18.31 MB |
| ConcurrentQueue | 42.40 ms | 0.377 ms | 0.353 ms | 3000.0000 | 2000.0000 | 1000.0000 | 30.31 MB |

| Method | Mean | Error | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
|----------- |---------:|---------:|---------:|----------:|----------:|----------:|----------:|
| Stack | 32.40 ms | 0.347 ms | 0.325 ms | 2000.0000 | 1000.0000 | - | 18.31 MB |
| ConcurrentStack | 62.64 ms | 0.708 ms | 0.628 ms | 5000.0000 | 3000.0000 | 1000.0000 | 37.38 MB |

area-System.Collections

Source

Symbai

Most helpful comment

I'm not trying to give you "less nice" answers; I'm sorry that's how it's coming across. I'm stating facts, and trying to understand the intent of the issue. From what I see in your initial stats, the collection is behaving as expected, and given that, unless there's a concrete suggestion about something that's wrong in the implementation or a suggestion for how it could improve, it's not actionable, which means the issue will just sit here forever.

Now, if your initial post had said "ConcurrentDictionary is in my test 50% slower than netfx", as is suggested in your last comment, that is something to be investigated.

stephentoub on 8 Dec 2019

👍3

All 13 comments

It is optimized for lock-free reads, while trying to remain scalable for writes. Adds require both allocating and locking, so the extra cost is expected.

stephentoub on 7 Dec 2019

If you expected it to be -80% slower due to thread safety please see my edit on other concurrent types which does not show SUCH a huge difference while also being thread safety.

Symbai on 7 Dec 2019

It is a fundamentally different type; you're comparing apples and oranges. What is it you expect to see done here? Do you have a concrete suggestion for how to improve add performance, which is what you're examining?

stephentoub on 8 Dec 2019

Why are you always giving me such "less nice" answers? What did I ever do to you!? I'm just reporting what I see and despise of the ConcurrentDictionary collection being completely out of bounds in tests. I also see a regression comparing to .NET Framework while all other (concurrent)collections have been improved.

| Method | Runtime | Mean | Error | StdDev | Median | Gen 0 | Gen 1 | Gen 2 | Allocated |
|--------------------- |-------------- |----------:|----------:|----------:|----------:|-----------:|----------:|----------:|----------:|
| Dictionary | .NET 4.8 | 91.62 ms | 2.836 ms | 8.137 ms | 89.18 ms | 2000.0000 | 1000.0000 | - | 19.12 MB |
| ConcurrentDictionary | .NET 4.8 | 470.92 ms | 9.379 ms | 24.378 ms | 460.93 ms | 18000.0000 | 8000.0000 | 3000.0000 | 112.57 MB |
| Stack | .NET 4.8 | 51.67 ms | 0.380 ms | 0.318 ms | 51.71 ms | 2000.0000 | 1000.0000 | - | 19.12 MB |
| ConcurrentStack | .NET 4.8 | 84.68 ms | 1.101 ms | 0.976 ms | 84.72 ms | 7000.0000 | 3000.0000 | 1000.0000 | 38.25 MB |
| Dictionary | .NET Core 3.1 | 77.09 ms | 1.517 ms | 1.747 ms | 77.20 ms | 2000.0000 | 1000.0000 | - | 18.31 MB |
| ConcurrentDictionary | .NET Core 3.1 | 625.43 ms | 12.447 ms | 23.379 ms | 629.70 ms | 15000.0000 | 9000.0000 | 3000.0000 | 109.63 MB |
| Stack | .NET Core 3.1 | 33.31 ms | 0.342 ms | 0.303 ms | 33.25 ms | 2000.0000 | 1000.0000 | - | 18.31 MB |
| ConcurrentStack | .NET Core 3.1 | 66.71 ms | 1.331 ms | 3.032 ms | 66.17 ms | 5000.0000 | 3000.0000 | 1000.0000 | 37.38 MB |

Symbai on 8 Dec 2019

Now, if your initial post had said "ConcurrentDictionary is in my test 50% slower than netfx", as is suggested in your last comment, that is something to be investigated.

stephentoub on 8 Dec 2019

👍3

I also see a regression comparing to .NET Framework

I took a look, and while I also see similar numbers, I don't believe it's due to a regression in ConcurrentDictionary. Rather, in the shared benchmark the key passed to ConcurrentDictionary is a string, and in .NET Core, string hashing via EqualityComparer<string>.Default or StringComparer.Ordinal always use randomized hashing, whereas in .NET Framework, you have to opt-in to it (https://docs.microsoft.com/en-us/dotnet/framework/configure-apps/file-schema/runtime/userandomizedstringhashalgorithm-element). When I turn that on in .NET Framework, the difference goes away:

| Method | Runtime | Mean | Error | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
|--------------------- |-------------- |---------:|---------:|---------:|-----------:|----------:|----------:|----------:|
| ConcurrentDictionary | .NET 4.7.2 | 563.5 ms | 11.18 ms | 15.30 ms | 18000.0000 | 8000.0000 | 3000.0000 | 109.68 MB |
| ConcurrentDictionary | .NET Core 3.1 | 551.6 ms | 10.89 ms | 13.77 ms | 17000.0000 | 7000.0000 | 3000.0000 | 108.93 MB |

Similarly, in the benchmark if I change the key to just be an int rather than int.ToString(), the "regression" goes away (actually core wins):

| Method | Runtime | Mean | Error | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
|--------------------- |-------------- |---------:|--------:|---------:|-----------:|----------:|----------:|----------:|
| ConcurrentDictionary | .NET 4.7.2 | 171.9 ms | 3.44 ms | 10.14 ms | 13000.0000 | 6000.0000 | 2000.0000 | 76.89 MB |
| ConcurrentDictionary | .NET Core 3.1 | 148.6 ms | 3.17 ms | 9.35 ms | 13000.0000 | 6000.0000 | 2000.0000 | 75.94 MB |

stephentoub on 8 Dec 2019

👍2

I suppose it would be much more complicated in ConcurrentDictionary to do the same trick we do in Dictionary, starting off with non randomized string hashes.

danmosemsft on 8 Dec 2019

Maybe the performance can be improved adding an overload which disables randomize string hashes per concurrentdictionary or does this has be per application base?!

Symbai on 8 Dec 2019

Maybe the performance can be improved adding an overload which disables randomize string hashes per concurrentdictionary or does this has be per application base?!

You can make it a per-instance thing yourself pretty easily (provided you make the comparer yourself):

new ConcurrentDictionary<string, Entry>(NonRandomizingStringHasher.Instance);

Joe4evr on 9 Dec 2019

@Symbai FWIW, I just use a locked Dictionary unless I know there are going to be ALOT of concurrent accesses. It's way faster for low to moderate contention. Less allocatey also.

AlgorithmsAreCool on 9 Dec 2019

I suppose it would be much more complicated in ConcurrentDictionary to do the same trick we do in Dictionary, starting off with non randomized string hashes.

There'd be some complexity and perf impact for non-string scenarios. We'd need to move the comparer from being stored on the dictionary to being stored on the tables object, so that when we resize and swap in a new tables instance, the comparer and the hashed data are always in sync. Related, we'd also need to change the code paths that currently access the comparer before accessing the tables, as they'd now have to get the comparer from the tables (that will have a negative impact on compound operations that could have previously hashed only once and now may need to has multiple times, effectively undoing https://github.com/dotnet/corefx/pull/426). And we'd need to figure out what heuristic to use for determining whether we've passed a sufficient hash collision rate to trigger the upgrade, since adds are local to a given partition in the dictionary. Not a slam dunk.

stephentoub on 12 Dec 2019

Triage: the issue seems to cover two independent considerations: general ConcurrentDictionary.Add() performance and randomized string hashcode generation performance. The former doesn't seem very actionable and the second seems to warrant a separate issue.