Runtime: ConcurrentDictionary perfomance optimization

Created on 4 Mar 2020 · 32Comments · Source: dotnet/runtime

Lock-Free hash table implementation.

Article
Benchmarks
All tests passed

Microsoft.DotNet.XUnitConsoleRunner v2.5.0 (64-bit .NET Core 5.0.0)
  Discovering: System.Collections.Concurrent.Tests (method display = ClassAndMethod, method display options = None)
  Discovered:  System.Collections.Concurrent.Tests (found 874 test cases)
  Starting:    System.Collections.Concurrent.Tests (parallel test collections = on, max threads = 12)
  Finished:    System.Collections.Concurrent.Tests
=== TEST EXECUTION SUMMARY ===
   System.Collections.Concurrent.Tests  Total: 2254, Errors: 0, Failed: 0, Skipped: 0, Time: 569.611s

API changes:
This class was written from scratch. The new design does not use concurrencyLevel for load balancing. Therefore, users no longer need to pay attention to this. In this regard, 3 new constructors without concurrencyLevel were added. Older constructors also work, but they are marked as obsolete. In addition, another overload of AddOrUpdate (TKey key, TValue addValue, TValue updateValue) was added.
Thus, the new class is fully compatible with the existing code that used it.

area-System.Collections tenet-performance

Source

MBurtsev

👍2

Most helpful comment

Java has a lock-free hashtable implementation. It is detailed in this talk: https://www.youtube.com/watch?v=HJ-719EGIts

The table is completely lock-free even for growing. The author developed it for machines that have 768 CPU cores all using the same hashtable. It seems that scalability is rather nice.

GSPP on 29 Oct 2020

👍6 ❤1

All 32 comments

@MBurtsev the Benchmarks link is giving me a 404? Is it a private repo?

QueenCitySmitty on 4 Mar 2020

@QueenCitySmitty yeah sorry )

MBurtsev on 4 Mar 2020

Can see now, thanks.

QueenCitySmitty on 4 Mar 2020

@stephentoub

scalablecory on 4 Mar 2020

Thanks! This looks like cool work.

One way you could help evaluate this is to add more benchmarks from different angles -- I.e. zero contention, low contention, high contention, mix of operation types, collisions, key/value types/sizes, NUMA, and so on. You might also check the dotnet/performance repo to see what existing benchmarks are available.

scalablecory on 4 Mar 2020

👍1

@MaximLipnin the existing benchmarks are in https://github.com/dotnet/performance/tree/master/src/benchmarks/micro/libraries/System.Collections. There are very good markdown docs in there explaining how to run them against private changes.

danmosemsft on 4 Mar 2020

@danmosemsft I looked at the tests you pointed out. In this folder. Using Task's to measure performance is not correct for multithreaded algorithms. Tasks are launched using the thread pool. Which does not guarantee instant start of thread. To obtain objective indicators, it is extremely important that the threads start at the same time as possible. For these purposes, I use the helper class, which allows to do accurate tests with BenchmarkDotNet.

MBurtsev on 11 Apr 2020

@stephentoub
I added a lot of unit tests that prove the absence of bugs. I would like to ask you to see this code if you have time for this.

MBurtsev on 11 Apr 2020

@jkotas As it turned out, my approach consumes 3 times less memory. Due to the lack of a linked list in the bucket. Because links and objects on the heap take up additional memory besides useful data.

MBurtsev on 11 Apr 2020

Using Task's to measure performance is not correct for multithreaded algorithms.

@adamsitnik thoughts about this?

danmosemsft on 11 Apr 2020

Tasks are launched using the thread pool. Which does not guarantee instant start of thread.

Not true here.
a) these are "LongRunning". Each is spinning up its own dedicated thread.
b) even if that wasn't the case, barriers are being used to block all the workers until they all reach the point to be measured, and measurement doesn't start until all threads have reached that point.

stephentoub on 11 Apr 2020

👍1

Not true here.

using System;
using System.Diagnostics;
using System.Threading;
using System.Threading.Tasks;

namespace ConsoleApp4
{
    class Program
    {
        static void Main(string[] args)
        {
            var count = 10_000_000;
            var result = 0;
            var threads = 16;
            var ready = 0;
            double start_time = Stopwatch.GetTimestamp();

            for (var i = 0; i < threads; ++i)
            {
                Task.Factory.StartNew(() =>
                {
                    //Interlocked.Add(ref ready, 1);

                    //while (Volatile.Read(ref ready) < threads)
                    //{
                    //}

                    var time = (Stopwatch.GetTimestamp() - start_time) * 1000_000_000d / Stopwatch.Frequency;

                    Console.WriteLine($"My time {time}ns");

                    for (var n = 0; n < count; ++n)
                    {
                        Interlocked.Add(ref result, 1);
                    }

                    Console.WriteLine(result);

                }, CancellationToken.None, TaskCreationOptions.LongRunning, TaskScheduler.Default);
            }

            Console.ReadLine();
        }
    }
}

My time 83843700ns
My time 83788400ns
My time 83779000ns
My time 83805800ns
My time 83783400ns
My time 83797100ns
My time 83814100ns
My time 83823500ns
My time 83827300ns
My time 83845900ns
My time 83850400ns
My time 83807800ns
My time 86535300ns
My time 90237900ns
My time 92942700ns
My time 95103600ns

MBurtsev on 11 Apr 2020

@stephentoub Yes, you are right, i can use tasks if barriers are used. This is so obvious. In my tests, I started threads manually, I used barriers the same way, and I don't see a problem in this.

MBurtsev on 12 Apr 2020

thoughts about this?

As @stephentoub has explained, our benchmarks are correct.

@MBurtsev could you please follow the instructions from this doc and run ConcurrentDictionary benchmarks against your local build of .NET runtime?

It should be a matter of running a single command:

dotnet run -c Release -f netcoreapp5.0 --filter '*ConcurrentDictionary*' --corerun $before $after --join

Once you share the results we should have more insight into the impact of your proposed changes.

adamsitnik on 14 Apr 2020

I'm confused: I tried very simple code and it doesn't work with this implementation of ConcurrentDictionary.
Basically, if 2 keys have same hash code, attempt to add a second key fails.

Looking at implementation I see that code does not support this case at all and possibly cannot by design.
Am I missing anything?

Below is example of the code. The code is simplified on purpose, but you expect hash code conflicts in real life.

public struct Foo : IEquatable
{
int v;
public Foo(int v) { this.v = v; }
public bool Equals(Foo f) { return this.v == f.v; }
public override bool Equals(object obj) { return obj is Foo && ((Foo)obj).v == this.v; }
public override int GetHashCode() { return 1; }
}

...
var d = new ConcurrentDictionary();
d.Add(new Foo(1), 1);
d.Add(new Foo(2), 2); // This throws

SergeyLavr on 24 Oct 2020

👀1

@MBurtsev are you willing to continue working on this?

En3Tho on 24 Oct 2020

@SergeyLavr Did you run your code yourself? It doesn't compile

And secondly
public override int GetHashCode() { return 1; }
If you override GetHashCode (), make sure you do it correctly.

MBurtsev on 26 Oct 2020

If you override GetHashCode (), make sure you do it correctly.

Always returning 1 is a valid GetHashCode (even if it's bad for perf).

stephentoub on 26 Oct 2020

@stephentoub https://docs.microsoft.com/en-us/dotnet/api/system.object.gethashcode?view=netcore-3.1

MBurtsev on 26 Oct 2020

👎1

Yes. Now read the paragraph after that.

stephentoub on 26 Oct 2020

Am I getting it right that this behavior is vital in the standard ConcurrentDictionary?

MBurtsev on 26 Oct 2020

Yes, in all .NET collections.

stephentoub on 26 Oct 2020

@stephentoub OK. In this case, this topic can be closed.

MBurtsev on 26 Oct 2020

Ok. Thanks for your efforts.

stephentoub on 26 Oct 2020

@En3Tho Sometimes, to achieve a goal, needs to sacrifice something. Dotnet will most likely always be a low-performance platform ;)

MBurtsev on 26 Oct 2020

@MBurtsev, I'm not sure how you draw that conclusion. Regardless, you should feel empowered to publish your collection as a nuget package for others to consume. If it yields improved performance for consumers and those consumers are aware of and can work with the limitations, everyone wins. That said, "sacrificing" correctness is a pretty big limitation.

stephentoub on 26 Oct 2020

😄1 👍1

@stephentoub It's just emotions. Actually, I have other ideas how to achieve significantly higher performance. Using the event jornal architecture. But for this it will be necessary to significantly change the API signature. Not compatible with the standard library. For example TryAdd (key, value, callback). Perhaps when I have time I will do it. Good luck to you.

MBurtsev on 26 Oct 2020

@SergeyLavr Did you run your code yourself? It doesn't compile

And secondly
public override int GetHashCode() { return 1; }
If you override GetHashCode (), make sure you do it correctly.

Yes, I ran the code myself. I had to update original to compile with .NET Framework 4.7.2 because I was interested to port it to older version - sorry for some confusion with it, I should have mentioned it from the beginning. I removed notnull and removed some interfaces just to quickly check functionality before doing bigger effort of porting.
However, it doesn't change the main point: I was especially suspicious about this functionality which failed based on my review of the code.

As to public override int GetHashCode() { return 1; } implementation: as was noticed before, it is simplified example to demonstrate the problem. Take string.GetHashCode implemented correctly, with all possible perf optimizations, you can still get same hash code from 2 different strings. Hash collisions are not avoidable so if dictionary fails on them it is just not right.

SergeyLavr on 26 Oct 2020

@SergeyLavr Yes you are right. Probably the bucket on the linked list is the only way. If HashCode were of type long, then for reference types the address of the object in memory could act as a hashcode. Or some kind of descriptor table. That would guarantee the uniqueness of hashcodes for objects whose hash functions can cause collisions. But these are fantastic thoughts :)

MBurtsev on 27 Oct 2020

the address of the object in memory could act as a hash code

The GC can move objects.

danmosemsft on 27 Oct 2020

Java has a lock-free hashtable implementation. It is detailed in this talk: https://www.youtube.com/watch?v=HJ-719EGIts

The table is completely lock-free even for growing. The author developed it for machines that have 768 CPU cores all using the same hashtable. It seems that scalability is rather nice.

GSPP on 29 Oct 2020

👍6 ❤1

@GSPP: Saw this video a while back and it is very cool. Always wondered if anything like this was considered to be added to .NET. However 768 core machines are very different from machine on which most .NET code runs. I'd love to hear analysis of pros and cons of this implementation (Perf, Memory, GC pressure etc.).

SergeyLavr on 29 Oct 2020

👍1

Was this page helpful?

0 / 5 - 0 ratings