The main goal of my study was to ensure that we ship .NET 5.0 without any performance regressions and validate whether in the near future we can fully rely on the regressions auto-filing bot written by @DrewScoggins.
My other goal was to get .NET Library Team members involved and keep on growing the performance culture.
#tl;dr The bot is doing a great job in detecting regressions. Most serious regressions have been already fixed, however a few investigations are still in progress.
In 2018 I had the pleasure to review @AndreyAkinshin "Pro .NET Benchmarking" book. The "Statistics for Performance Engineers" and "Performance Analysis and Performance Testing" chapters inspired me to implement a small tool called Results Comparer. The tool uses the Mann-Whitney U statistical test to detect performance regressions in results exported by BenchmarkDotNet. It's being used (or at least it should) as part of our benchmarking workflow to prevent introducing regressions to .NET.
In 2019 I was asked by @danmosemsft to verify .NET Core 3.0 performance. Initially, I鈥檝e run all the microbenchmarks from dotnet/performance repository using a single machine with dual boot for Windows 10 and Ubuntu 18.04 x64 and used the Results Comparer to find regressions. It very quickly turned out that such a sample was way too small to make sure that we don鈥檛 have any regressions. Some benchmarks were simply unstable, some architectures like ARM and ARM64 were simply not covered. Other Linux distros and CPU families were also not covered.
Then I鈥檝e run the benchmarks on all the PCs, laptops, and VMs that I could access. But I was still missing AMD and ARM results, so I've asked @tannergooding and @BruceForstall for help. @tannergooding has run the benchmarks on all his AMD machines. @BruceForstall has provided me access to a document that explains how to use ARM machines owned by the JIT Team. This turned out to be an invaluable help as I've used these machines many, many times. Including this year during the 5.0 investigation.
After having enough samples to cover our matrix of supported OSes and architectures, I鈥檝e built a simple console app on top of ResultsComparer
(source code available here). The tool uses the very same statistical test to detect regressions, aggregates the results from all different configurations, and sorts them from the biggest regression to the biggest improvement.
Such approach allows for very quick identification of regressions of all kinds:
| Result | Base | Diff | Ratio | Operating System | Bit |
| ------ | -------:| --------:| -----:| ----------------------- | ----- |
| Slower | 570.88 | 3069.76 | 0.19 | Windows 10.0.19041.388 | X64 |
| Slower | 610.20 | 3674.19 | 0.17 | Windows 10.0.18363.959 | X64 |
| Slower | 598.37 | 3519.26 | 0.17 | Windows 10.0.18363.959 | X64 |
| Slower | 700.86 | 4238.85 | 0.17 | Windows 10.0.19041.450 | X64 |
| Slower | 583.19 | 3538.60 | 0.16 | Windows 10.0.19041.450 | X64 |
| Slower | 546.58 | 3015.23 | 0.18 | Windows 10.0.19042 | X64 |
| Slower | 665.53 | 3776.10 | 0.18 | Windows 10.0.19041.450 | X64 |
| Slower | 515.15 | 3162.05 | 0.16 | Windows 10.0.19041.450 | X64 |
| Slower | 626.94 | 3928.55 | 0.16 | ubuntu 18.04 | X64 |
| Slower | 630.90 | 4196.01 | 0.15 | manjaro | X64 |
| Slower | 813.80 | 4605.57 | 0.18 | pop 20.04 | X64 |
| Slower | 608.59 | 3587.44 | 0.17 | alpine 3.11 | X64 |
| Slower | 615.67 | 3390.01 | 0.18 | ubuntu 18.04 | X64 |
| Slower | 2148.33 | 10335.71 | 0.21 | ubuntu 16.04 | Arm64 |
| Slower | 2183.77 | 10620.53 | 0.21 | ubuntu 16.04 | Arm64 |
| Slower | 2163.67 | 10815.16 | 0.20 | ubuntu 16.04 | Arm64 |
| Slower | 1176.33 | 11641.04 | 0.10 | ubuntu 18.04 | Arm64 |
| Slower | 1550.48 | 5183.74 | 0.30 | ubuntu 20.04 | Arm64 |
| Slower | 568.67 | 3637.59 | 0.16 | Windows 10.0.18363.959 | X86 |
| Slower | 664.86 | 4576.24 | 0.15 | Windows 10.0.19041.450 | X86 |
| Slower | 972.74 | 8054.46 | 0.12 | Windows 10.0.18363.1016 | Arm |
| Slower | 790.15 | 5171.92 | 0.15 | macOS Catalina 10.15.6 | X64 |
| Slower | 668.62 | 4153.54 | 0.16 | macOS Catalina 10.15.6 | X64 |
| Slower | 743.69 | 4727.58 | 0.16 | macOS Mojave 10.14.5 | X64 |
| Result | Base | Diff | Ratio | Operating System | Bit |
| ------ | --------:| --------:| -----:| ----------------------- | ----- |
| Slower | 53.24 | 26589.31 | 0.00 | Windows 10.0.19041.388 | X64 |
| Slower | 65.47 | 28371.93 | 0.00 | Windows 10.0.18363.959 | X64 |
| Slower | 63.89 | 27952.39 | 0.00 | Windows 10.0.18363.959 | X64 |
| Slower | 75.24 | 35910.74 | 0.00 | Windows 10.0.19041.450 | X64 |
| Slower | 67.29 | 55198.94 | 0.00 | Windows 10.0.19041.450 | X64 |
| Slower | 58.36 | 31008.73 | 0.00 | Windows 10.0.19042 | X64 |
| Slower | 70.38 | 34632.87 | 0.00 | Windows 10.0.19041.450 | X64 |
| Slower | 58.92 | 27533.16 | 0.00 | Windows 10.0.19041.450 | X64 |
| Same | 24197.26 | 24316.40 | 1.00 | ubuntu 18.04 | X64 |
| Same | 23317.93 | 23585.42 | 0.99 | manjaro | X64 |
| Same | 30855.66 | 30176.99 | 1.02 | pop 20.04 | X64 |
| Same | 29081.88 | 28590.29 | 1.02 | alpine 3.11 | X64 |
| Same | 23929.07 | 23728.33 | 1.01 | ubuntu 18.04 | X64 |
| Same | 51918.86 | 51256.87 | 1.01 | ubuntu 16.04 | Arm64 |
| Same | 51674.77 | 51693.86 | 1.00 | ubuntu 16.04 | Arm64 |
| Same | 51690.93 | 52015.88 | 0.99 | ubuntu 16.04 | Arm64 |
| Same | 61071.92 | 43711.17 | 1.40 | ubuntu 18.04 | Arm64 |
| Faster | 43870.66 | 26020.13 | 1.69 | ubuntu 20.04 | Arm64 |
| Slower | 78.42 | 36208.27 | 0.00 | Windows 10.0.18363.959 | X86 |
| Slower | 88.01 | 42312.37 | 0.00 | Windows 10.0.19041.450 | X86 |
| Slower | 104.29 | 57622.86 | 0.00 | Windows 10.0.18363.1016 | Arm |
| Same | 38089.02 | 40079.68 | 0.95 | macOS Catalina 10.15.6 | X64 |
| Same | 32208.09 | 32537.00 | 0.99 | macOS Catalina 10.15.6 | X64 |
| Same | 32575.17 | 32782.69 | 0.99 | macOS Mojave 10.14.5 | X64 |
| Result | Base | Diff | Ratio | Operating System | Bit |
| ------ | -------:| -------:| -----:| ----------------------- | ----- |
| Same | 116.42 | 120.28 | 0.97 | Windows 10.0.19041.388 | X64 |
| Same | 148.25 | 146.53 | 1.01 | Windows 10.0.18363.959 | X64 |
| Same | 144.37 | 144.09 | 1.00 | Windows 10.0.18363.959 | X64 |
| Same | 154.82 | 151.57 | 1.02 | Windows 10.0.19041.450 | X64 |
| Same | 134.57 | 133.40 | 1.01 | Windows 10.0.19041.450 | X64 |
| Same | 122.52 | 119.39 | 1.03 | Windows 10.0.19042 | X64 |
| Same | 154.48 | 150.92 | 1.02 | Windows 10.0.19041.450 | X64 |
| Same | 128.87 | 122.90 | 1.05 | Windows 10.0.19041.450 | X64 |
| Same | 169.50 | 168.46 | 1.01 | ubuntu 18.04 | X64 |
| Faster | 171.67 | 155.11 | 1.11 | manjaro | X64 |
| Same | 179.54 | 175.17 | 1.02 | pop 20.04 | X64 |
| Slower | 146.39 | 203.94 | 0.72 | alpine 3.11 | X64 |
| Same | 179.39 | 180.75 | 0.99 | ubuntu 18.04 | X64 |
| Same | 1068.08 | 1029.35 | 1.04 | ubuntu 16.04 | Arm64 |
| Same | 1066.73 | 1056.79 | 1.01 | ubuntu 16.04 | Arm64 |
| Same | 1111.72 | 1037.54 | 1.07 | ubuntu 16.04 | Arm64 |
| Same | 751.74 | 622.83 | 1.21 | ubuntu 18.04 | Arm64 |
| Faster | 675.51 | 318.18 | 2.12 | ubuntu 20.04 | Arm64 |
| Same | 258.80 | 257.15 | 1.01 | Windows 10.0.18363.959 | X86 |
| Same | 194.61 | 192.96 | 1.01 | Windows 10.0.19041.450 | X86 |
| Same | 486.93 | 508.05 | 0.96 | Windows 10.0.18363.1016 | Arm |
| Same | 200.25 | 203.78 | 0.98 | macOS Catalina 10.15.6 | X64 |
| Same | 168.62 | 163.47 | 1.03 | macOS Catalina 10.15.6 | X64 |
| Same | 174.95 | 177.88 | 0.98 | macOS Mojave 10.14.5 | X64 |
| Result | Base | Diff | Ratio | Operating System | Bit | Processor Name |
| ------ | ------------:| ------------:| -----:| ----------------------- | ----- | --------------------------------------------- |
| Same | 125616750.00 | 125476550.00 | 1.00 | Windows 10.0.19041.388 | X64 | AMD Ryzen 9 3900X |
| Same | 161388400.00 | 156493500.00 | 1.03 | Windows 10.0.18363.959 | X64 | Intel Xeon CPU E5-1650 v4 3.60GHz |
| Same | 154933500.00 | 154730800.00 | 1.00 | Windows 10.0.18363.959 | X64 | Intel Xeon CPU E5-1650 v4 3.60GHz |
| Same | 180481800.00 | 180129900.00 | 1.00 | Windows 10.0.19041.450 | X64 | Intel Core i7-5557U CPU 3.10GHz (Broadwell) |
| Slower | 161742300.00 | 211160300.00 | 0.77 | Windows 10.0.19041.450 | X64 | Intel Core i7-6700 CPU 3.40GHz (Skylake) |
| Same | 152928600.00 | 150232700.00 | 1.02 | Windows 10.0.19042 | X64 | Intel Core i7-7700 CPU 3.60GHz (Kaby Lake) |
| Same | 206708750.00 | 206860050.00 | 1.00 | Windows 10.0.19041.450 | X64 | Intel Core i7-8650U CPU 1.90GHz (Kaby Lake R) |
| Slower | 140924300.00 | 185228400.00 | 0.76 | Windows 10.0.19041.450 | X64 | Intel Core i7-8700 CPU 3.20GHz (Coffee Lake) |
| Same | 154948321.00 | 154788579.50 | 1.00 | ubuntu 18.04 | X64 | Intel Xeon CPU E5-1650 v4 3.60GHz |
| Same | 175860282.50 | 163007313.50 | 1.08 | manjaro | X64 | Intel Core i7-4771 CPU 3.50GHz (Haswell) |
| Slower | 199713880.00 | 255270486.50 | 0.78 | pop 20.04 | X64 | Intel Core i7-6600U CPU 2.60GHz (Skylake) |
| Same | 151256100.00 | 168661900.00 | 0.90 | alpine 3.11 | X64 | Intel Core i7-7700 CPU 3.60GHz (Kaby Lake) |
| Same | 171229200.00 | 165843050.00 | 1.03 | ubuntu 18.04 | X64 | Intel Core i7-7700 CPU 3.60GHz (Kaby Lake) |
| Same | 503785101.00 | 505992400.50 | 1.00 | ubuntu 16.04 | Arm64 | Unknown processor |
| Same | 503901205.00 | 506190175.00 | 1.00 | ubuntu 16.04 | Arm64 | Unknown processor |
| Same | 504131772.50 | 506220395.00 | 1.00 | ubuntu 16.04 | Arm64 | Unknown processor |
| Same | 473629200.00 | 541631800.00 | 0.87 | ubuntu 18.04 | Arm64 | Unknown processor |
| Same | 331381500.00 | 333779500.00 | 0.99 | ubuntu 20.04 | Arm64 | Unknown processor |
| Same | 246876150.00 | 247010200.00 | 1.00 | Windows 10.0.18363.959 | X86 | Intel Xeon CPU E5-1650 v4 3.60GHz |
| Same | 290036150.00 | 289409500.00 | 1.00 | Windows 10.0.19041.450 | X86 | Intel Core i7-5557U CPU 3.10GHz (Broadwell) |
| Same | 418007450.00 | 415404450.00 | 1.01 | Windows 10.0.18363.1016 | Arm | Microsoft SQ1 3.0 GHz |
| Same | 204196936.50 | 204410652.50 | 1.00 | macOS Catalina 10.15.6 | X64 | Intel Core i5-4278U CPU 2.60GHz (Haswell) |
| Same | 176763730.00 | 175647563.50 | 1.01 | macOS Catalina 10.15.6 | X64 | Intel Core i7-4870HQ CPU 2.50GHz (Haswell) |
| Same | 180812724.00 | 184849205.00 | 0.98 | macOS Mojave 10.14.5 | X64 | Intel Core i7-5557U CPU 3.10GHz (Broadwell) |
Using the tool had one major flaw: it was not automated and hence we were finding out about the regressions only when we searched for them.
This has been recognized and a new project has been started. In 2020 @DrewScoggins started implementing a GitHub bot that would be using the data gathered from performance lab (a set of machines owned by .NET Performance Team) microbenchmark runs to detect and auto-file the regressions. So far the bot was reporting new issues in a dedicated repository and once a week the workgroup led by @DrewScoggins that consisted of @AndyAyersMS, @kunalspathak, @tannergooding any myself was going through the list and triaging the issues. Issues that were seemed as actual regressions were labeled as Needs Transfer and were later moved by @DrewScoggins to the runtime repo.
A few weeks ago we were getting close to "code freeze" for .NET 5 and I have asked myself a question: are we sure that the bot has reported all possible regressions for all the supported OS versions?
The bot is using different statistical methods to detect regressions and so far it has been enabled only for Windows 10 x64, Ubuntu 18.04 x64, and Windows 10 x86. So I've decided to spend some time and use the old tool that I wrote to verify it. To increase the sample size and get other .NET Libraries Team members involved, I've simply asked the Team to run the benchmarks and share the results with me.
Running the performance repo microbenchmarks against the latest .NET Core SDK is super easy thanks to a python script implemented by @jorive. The script downloads the right SDK and starts benchmarking with cleared environment variables.
git clone https://github.com/dotnet/performance.git
python3 ./performance/scripts/benchmarks_ci.py -f netcoreapp3.1 netcoreapp5.0 --filter '*'
The data I've received from the .NET Libraries Team members allowed me a big part of the entire matrix of the supported configurations:
| Operating System | Arch | Processor Name | Provided by |
| ----------------------- | ----- | --------------------------------------------- | ------------------- |
| Windows 10.0.19041.388 | X64 | AMD Ryzen 9 3900X | @tannergooding |
| Windows 10.0.18363.959 | X64 | Intel Xeon CPU E5-1650 v4 3.60GHz | @adamsitnik |
| Windows 10.0.19041.450 | X64 | Intel Core i7-5557U CPU 3.10GHz (Broadwell) | @adamsitnik |
| Windows 10.0.19041.450 | X64 | Intel Core i7-6700 CPU 3.40GHz (Skylake) | @GrabYourPitchforks |
| Windows 10.0.19042 | X64 | Intel Core i7-7700 CPU 3.60GHz (Kaby Lake) | @danmosemsft |
| Windows 10.0.19041.450 | X64 | Intel Core i7-8650U CPU 1.90GHz (Kaby Lake R) | @jeffhandley |
| Windows 10.0.19041.450 | X64 | Intel Core i7-8700 CPU 3.20GHz (Coffee Lake) | @jeffhandley |
| ubuntu 18.04 | X64 | Intel Xeon CPU E5-1650 v4 3.60GHz | @adamsitnik |
| manjaro | X64 | Intel Core i7-4771 CPU 3.50GHz (Haswell) | @ManickaP |
| pop 20.04 | X64 | Intel Core i7-6600U CPU 2.60GHz (Skylake) | @carlossanlop |
| alpine 3.11 (WSL2) | X64 | Intel Core i7-7700 CPU 3.60GHz (Kaby Lake) | @danmosemsft |
| ubuntu 18.04 (WSL2) | X64 | Intel Core i7-7700 CPU 3.60GHz (Kaby Lake) | @danmosemsft |
| ubuntu 16.04 | Arm64 | Qualcomm Centriq | @adamsitnik |
| ubuntu 18.04 (WSL2) | Arm64 | Microsoft SQ1 3.0 GHz (Surface Pro X) | @carlossanlop |
| ubuntu 20.04 (WSL2) | Arm64 | Microsoft SQ1 3.0 GHz (Surface Pro X) | @pgovind |
| Windows 10.0.18363.959 | X86 | Intel Xeon CPU E5-1650 v4 3.60GHz | @adamsitnik |
| Windows 10.0.19041.450 | X86 | Intel Core i7-5557U CPU 3.10GHz (Broadwell) | @adamsitnik |
| Windows 10.0.18363.1016 | Arm | Microsoft SQ1 3.0 GHz (Surface Pro X) | @adamsitnik |
| macOS Catalina 10.15.6 | X64 | Intel Core i5-4278U CPU 2.60GHz (Haswell) | @jeffhandley |
| macOS Catalina 10.15.6 | X64 | Intel Core i7-4870HQ CPU 2.50GHz (Haswell) | @carlossanlop |
| macOS Mojave 10.14.5 | X64 | Intel Core i7-5557U CPU 3.10GHz (Broadwell) | @adamsitnik |
Everyone interested can download the data from here. The full report generated by the tool is available here.
Moreover, the full historical data turned out to be extremely useful. I've used it every time I was not sure whether something was a regression or just unstable|multimodal benchmark:
System.Collections.Contains*
, System.Memory.SequenceReader.TryReadTo
, System.Text.Json.Tests.Perf_Segment.ReadSingleSegmentSequenceByN
x86
and ARM
)System.Collections.CtorGivenSize<Int32>.Array(Size: 512)
System.Numerics.Tests.Perf_Quaternion.Conjugate
and System.Numerics.Tests.Perf_Quaternion.Negat*
Directory.EnumerateFiles
ByteMark.BenchIDEAEncryption
System.Text.Perf_Utf8Encoding
System.Memory.Slice
PerfLabTests.CastingPerf2.CastingPerf.IntObj
System.Globalization.Tests.StringSearch
: detected by the bot, reported in https://github.com/dotnet/runtime/issues/37819System.Memory.ReadOnlySpan.IndexOfString
: detected by the bot, reported in https://github.com/dotnet/runtime/issues/39724System.Globalization.Tests.Perf_DateTimeCultureInfo.Parse(culturestring: ja)
: detected by the bot, reported in https://github.com/dotnet/runtime/issues/37807System.Globalization.Tests.StringEquality
: detected by the bot, reported in https://github.com/dotnet/runtime/issues/39038OrdinalIgnoreCase
has been optimized in https://github.com/dotnet/runtime/pull/40962System.Linq.Tests.Perf_Enumerable.FirstWithPredicate_LastElementMatches(input: IOrderedEnumerable)
O(N log N)
cost of the OrderBy
https://github.com/dotnet/runtime/issues/39032#issuecomment-656678750 System.Collections.Tests.Perf_BitArray.*(Size: 4)
System.Threading.Tests.Perf_Thread.GetCurrentProcessorId
PerfLabTests.CastingPerf.CheckIsInstAnyIsInterfaceNo
, PerfLabTests.CastingPerf.CheckObjIsInterfaceNo
System.Net.NetworkInformation.Tests.PhysicalAddressTests.PAShort
System.Numerics.Tests.Perf_Vector*.GetHashCodeBenchmark
System.Net.Primitives.Tests.CredentialCacheTests.ForEach(uriCount: 0, hostPortCount: 0)
System.Tests.Perf_Char.GetUnicodeCategory(c: '?')
PerfLabTests.StackWalk.Walk
System.Tests.Perf_String.Replace_Char(text: "Hello", oldChar: 'l', newChar: '!')
System.Text.Perf_Utf8String.IsAscii(Input: EnglishAllAscii)
Utf8String
is still only experimentalSystem.Text.Encodings.Web.Tests.Perf_Encoders.EncodeUtf8
There was of course more of them, here are the ones that I've noted to use as Contract Tests in the near future (to reduce the noise produced by the bot):
System.Buffers.Tests.RentReturnArrayPoolTests<Byte>.ProducerConsumer
System.Memory.ReadOnlySequence.Slice_Repeat_StartPosition_And_EndPosition(Segment: Multiple)
PerfLabTests.BlockCopyPerf.CallBlockCopy
System.Tests.Perf_String.Trim_CharArr(s: "Test", c: [' ', ' '])
System.Threading.Tests.Perf_Interlocked.CompareExchange_long
10ns
, but sometimes x100
that. Only for x86
. I need logs to verify whether it's a BDN bug or not.System.Memory.Span<Int32>.IndexOfValue(Size: 512)
Benchstone.BenchI.Fib.Test
GNU libc
based Linux distros like Ubuntu is not enough to detect musl libc
specific regressions. We should consider adding Alpine runs to the perf lab.macOS
and different CPU families were discovered. It has proven that the perf lab has good hardware coverage.The Alpine regression has shown that an increased number of Gen 0 collections can be a very valuable metric to detect regressions. We should consider extending the bot to use it.
Big thanks to everyone involved!
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.
Great work @adamsitnik . I am pleased that our systems have improved since last cycle, and that you and @DrewScoggins will be using this exercise to improve them such that more issues are found earlier and not at the end of the cycle.
Cc @Lxiamail @jkotas
no important issues specific to macOS and different CPU families were discovered. It has proven that the perf lab has good hardware coverage.
I thought the perf lab did not cover Mac and only had one type of CPU. How can we catch such regression before the end of the cycle?
thought the perf lab did not cover Mac and only had one type of CPU.
You are right.
no important issues specific to macOS and different CPU families were discovered. It has proven that the perf lab has good hardware coverage.
I meant that in my study I have gathered macOS and different CPU families results but I did not find any important regressions (specific to macOS etc), so adding macOS or more types of CPUs to perf lab is not needed
Could an effort like this (script run over manual collected submissions) be made cheap enough that we could do it occasionally during the cycle?
@adamsitnik Great job! Based on the data out of this exercise, we will add Alpine to .net perf lab.
Is the collected data internal only or available publicly as well?
Could an effort like this (script run over manual collected submissions) be made cheap enough that we could do it occasionally during the cycle?
We talked about this offline, but sharing the comment here too... We'll be putting effort into that between now and the first 6.0 previews with the goal of completing targeting manual runs for each of the 6.0 Preview/RC releases.
thought the perf lab did not cover Mac and only had one type of CPU.
You are right.
no important issues specific to macOS and different CPU families were discovered. It has proven that the perf lab has good hardware coverage.
I meant that in my study I have gathered macOS and different CPU families results but I did not find any important regressions (specific to macOS etc), so adding macOS or more types of CPUs to perf lab is not needed
Just for clarity here. We will have a brand new batch of AMD hardware this calendar year. So we will not be covering MacOS in the lab, but we will have both Intel and AMD hardware for coverage.
@DrewScoggins, do we know what the hardware specs are?
I do not, @billwert did the work of speccing out the machines.
Most helpful comment
Just for clarity here. We will have a brand new batch of AMD hardware this calendar year. So we will not be covering MacOS in the lab, but we will have both Intel and AMD hardware for coverage.