Runtime: .NET 5.0 Microbenchmarks Performance Study Report

Created on 4 Sep 2020 · 10Comments · Source: dotnet/runtime

Goals

The main goal of my study was to ensure that we ship .NET 5.0 without any performance regressions and validate whether in the near future we can fully rely on the regressions auto-filing bot written by @DrewScoggins.
My other goal was to get .NET Library Team members involved and keep on growing the performance culture.

#tl;dr The bot is doing a great job in detecting regressions. Most serious regressions have been already fixed, however a few investigations are still in progress.

Methodology (and how it evolved)

In 2018 I had the pleasure to review @AndreyAkinshin "Pro .NET Benchmarking" book. The "Statistics for Performance Engineers" and "Performance Analysis and Performance Testing" chapters inspired me to implement a small tool called Results Comparer. The tool uses the Mann-Whitney U statistical test to detect performance regressions in results exported by BenchmarkDotNet. It's being used (or at least it should) as part of our benchmarking workflow to prevent introducing regressions to .NET.

In 2019 I was asked by @danmosemsft to verify .NET Core 3.0 performance. Initially, I’ve run all the microbenchmarks from dotnet/performance repository using a single machine with dual boot for Windows 10 and Ubuntu 18.04 x64 and used the Results Comparer to find regressions. It very quickly turned out that such a sample was way too small to make sure that we don’t have any regressions. Some benchmarks were simply unstable, some architectures like ARM and ARM64 were simply not covered. Other Linux distros and CPU families were also not covered.

Then I’ve run the benchmarks on all the PCs, laptops, and VMs that I could access. But I was still missing AMD and ARM results, so I've asked @tannergooding and @BruceForstall for help. @tannergooding has run the benchmarks on all his AMD machines. @BruceForstall has provided me access to a document that explains how to use ARM machines owned by the JIT Team. This turned out to be an invaluable help as I've used these machines many, many times. Including this year during the 5.0 investigation.

After having enough samples to cover our matrix of supported OSes and architectures, I’ve built a simple console app on top of ResultsComparer (source code available here). The tool uses the very same statistical test to detect regressions, aggregates the results from all different configurations, and sorts them from the biggest regression to the biggest improvement.

Such approach allows for very quick identification of regressions of all kinds:

affecting every configuration

System.Linq.Tests.Perf_Enumerable.FirstWithPredicate_LastElementMatches(input: IOrderedEnumerable)

| Result | Base | Diff | Ratio | Operating System | Bit |
| ------ | -------:| --------:| -----:| ----------------------- | ----- |
| Slower | 570.88 | 3069.76 | 0.19 | Windows 10.0.19041.388 | X64 |
| Slower | 610.20 | 3674.19 | 0.17 | Windows 10.0.18363.959 | X64 |
| Slower | 598.37 | 3519.26 | 0.17 | Windows 10.0.18363.959 | X64 |
| Slower | 700.86 | 4238.85 | 0.17 | Windows 10.0.19041.450 | X64 |
| Slower | 583.19 | 3538.60 | 0.16 | Windows 10.0.19041.450 | X64 |
| Slower | 546.58 | 3015.23 | 0.18 | Windows 10.0.19042 | X64 |
| Slower | 665.53 | 3776.10 | 0.18 | Windows 10.0.19041.450 | X64 |
| Slower | 515.15 | 3162.05 | 0.16 | Windows 10.0.19041.450 | X64 |
| Slower | 626.94 | 3928.55 | 0.16 | ubuntu 18.04 | X64 |
| Slower | 630.90 | 4196.01 | 0.15 | manjaro | X64 |
| Slower | 813.80 | 4605.57 | 0.18 | pop 20.04 | X64 |
| Slower | 608.59 | 3587.44 | 0.17 | alpine 3.11 | X64 |
| Slower | 615.67 | 3390.01 | 0.18 | ubuntu 18.04 | X64 |
| Slower | 2148.33 | 10335.71 | 0.21 | ubuntu 16.04 | Arm64 |
| Slower | 2183.77 | 10620.53 | 0.21 | ubuntu 16.04 | Arm64 |
| Slower | 2163.67 | 10815.16 | 0.20 | ubuntu 16.04 | Arm64 |
| Slower | 1176.33 | 11641.04 | 0.10 | ubuntu 18.04 | Arm64 |
| Slower | 1550.48 | 5183.74 | 0.30 | ubuntu 20.04 | Arm64 |
| Slower | 568.67 | 3637.59 | 0.16 | Windows 10.0.18363.959 | X86 |
| Slower | 664.86 | 4576.24 | 0.15 | Windows 10.0.19041.450 | X86 |
| Slower | 972.74 | 8054.46 | 0.12 | Windows 10.0.18363.1016 | Arm |
| Slower | 790.15 | 5171.92 | 0.15 | macOS Catalina 10.15.6 | X64 |
| Slower | 668.62 | 4153.54 | 0.16 | macOS Catalina 10.15.6 | X64 |
| Slower | 743.69 | 4727.58 | 0.16 | macOS Mojave 10.14.5 | X64 |

affecting specific OS families (Windows, Unix)

System.Globalization.Tests.StringSearch.IsPrefix_DifferentFirstChar(Options: (en-US, IgnoreSymbols, False))

| Result | Base | Diff | Ratio | Operating System | Bit |
| ------ | --------:| --------:| -----:| ----------------------- | ----- |
| Slower | 53.24 | 26589.31 | 0.00 | Windows 10.0.19041.388 | X64 |
| Slower | 65.47 | 28371.93 | 0.00 | Windows 10.0.18363.959 | X64 |
| Slower | 63.89 | 27952.39 | 0.00 | Windows 10.0.18363.959 | X64 |
| Slower | 75.24 | 35910.74 | 0.00 | Windows 10.0.19041.450 | X64 |
| Slower | 67.29 | 55198.94 | 0.00 | Windows 10.0.19041.450 | X64 |
| Slower | 58.36 | 31008.73 | 0.00 | Windows 10.0.19042 | X64 |
| Slower | 70.38 | 34632.87 | 0.00 | Windows 10.0.19041.450 | X64 |
| Slower | 58.92 | 27533.16 | 0.00 | Windows 10.0.19041.450 | X64 |
| Same | 24197.26 | 24316.40 | 1.00 | ubuntu 18.04 | X64 |
| Same | 23317.93 | 23585.42 | 0.99 | manjaro | X64 |
| Same | 30855.66 | 30176.99 | 1.02 | pop 20.04 | X64 |
| Same | 29081.88 | 28590.29 | 1.02 | alpine 3.11 | X64 |
| Same | 23929.07 | 23728.33 | 1.01 | ubuntu 18.04 | X64 |
| Same | 51918.86 | 51256.87 | 1.01 | ubuntu 16.04 | Arm64 |
| Same | 51674.77 | 51693.86 | 1.00 | ubuntu 16.04 | Arm64 |
| Same | 51690.93 | 52015.88 | 0.99 | ubuntu 16.04 | Arm64 |
| Same | 61071.92 | 43711.17 | 1.40 | ubuntu 18.04 | Arm64 |
| Faster | 43870.66 | 26020.13 | 1.69 | ubuntu 20.04 | Arm64 |
| Slower | 78.42 | 36208.27 | 0.00 | Windows 10.0.18363.959 | X86 |
| Slower | 88.01 | 42312.37 | 0.00 | Windows 10.0.19041.450 | X86 |
| Slower | 104.29 | 57622.86 | 0.00 | Windows 10.0.18363.1016 | Arm |
| Same | 38089.02 | 40079.68 | 0.95 | macOS Catalina 10.15.6 | X64 |
| Same | 32208.09 | 32537.00 | 0.99 | macOS Catalina 10.15.6 | X64 |
| Same | 32575.17 | 32782.69 | 0.99 | macOS Mojave 10.14.5 | X64 |

affecting specific Linux distros

System.Threading.Tests.Perf_CancellationToken.Cancel

| Result | Base | Diff | Ratio | Operating System | Bit |
| ------ | -------:| -------:| -----:| ----------------------- | ----- |
| Same | 116.42 | 120.28 | 0.97 | Windows 10.0.19041.388 | X64 |
| Same | 148.25 | 146.53 | 1.01 | Windows 10.0.18363.959 | X64 |
| Same | 144.37 | 144.09 | 1.00 | Windows 10.0.18363.959 | X64 |
| Same | 154.82 | 151.57 | 1.02 | Windows 10.0.19041.450 | X64 |
| Same | 134.57 | 133.40 | 1.01 | Windows 10.0.19041.450 | X64 |
| Same | 122.52 | 119.39 | 1.03 | Windows 10.0.19042 | X64 |
| Same | 154.48 | 150.92 | 1.02 | Windows 10.0.19041.450 | X64 |
| Same | 128.87 | 122.90 | 1.05 | Windows 10.0.19041.450 | X64 |
| Same | 169.50 | 168.46 | 1.01 | ubuntu 18.04 | X64 |
| Faster | 171.67 | 155.11 | 1.11 | manjaro | X64 |
| Same | 179.54 | 175.17 | 1.02 | pop 20.04 | X64 |
| Slower | 146.39 | 203.94 | 0.72 | alpine 3.11 | X64 |
| Same | 179.39 | 180.75 | 0.99 | ubuntu 18.04 | X64 |
| Same | 1068.08 | 1029.35 | 1.04 | ubuntu 16.04 | Arm64 |
| Same | 1066.73 | 1056.79 | 1.01 | ubuntu 16.04 | Arm64 |
| Same | 1111.72 | 1037.54 | 1.07 | ubuntu 16.04 | Arm64 |
| Same | 751.74 | 622.83 | 1.21 | ubuntu 18.04 | Arm64 |
| Faster | 675.51 | 318.18 | 2.12 | ubuntu 20.04 | Arm64 |
| Same | 258.80 | 257.15 | 1.01 | Windows 10.0.18363.959 | X86 |
| Same | 194.61 | 192.96 | 1.01 | Windows 10.0.19041.450 | X86 |
| Same | 486.93 | 508.05 | 0.96 | Windows 10.0.18363.1016 | Arm |
| Same | 200.25 | 203.78 | 0.98 | macOS Catalina 10.15.6 | X64 |
| Same | 168.62 | 163.47 | 1.03 | macOS Catalina 10.15.6 | X64 |
| Same | 174.95 | 177.88 | 0.98 | macOS Mojave 10.14.5 | X64 |

affecting specific CPU families

System.Buffers.Text.Tests.Base64EncodeDecodeInPlaceTests.Base64EncodeInPlace(NumberOfBytes: 200000000)

| Result | Base | Diff | Ratio | Operating System | Bit | Processor Name |
| ------ | ------------:| ------------:| -----:| ----------------------- | ----- | --------------------------------------------- |
| Same | 125616750.00 | 125476550.00 | 1.00 | Windows 10.0.19041.388 | X64 | AMD Ryzen 9 3900X |
| Same | 161388400.00 | 156493500.00 | 1.03 | Windows 10.0.18363.959 | X64 | Intel Xeon CPU E5-1650 v4 3.60GHz |
| Same | 154933500.00 | 154730800.00 | 1.00 | Windows 10.0.18363.959 | X64 | Intel Xeon CPU E5-1650 v4 3.60GHz |
| Same | 180481800.00 | 180129900.00 | 1.00 | Windows 10.0.19041.450 | X64 | Intel Core i7-5557U CPU 3.10GHz (Broadwell) |
| Slower | 161742300.00 | 211160300.00 | 0.77 | Windows 10.0.19041.450 | X64 | Intel Core i7-6700 CPU 3.40GHz (Skylake) |
| Same | 152928600.00 | 150232700.00 | 1.02 | Windows 10.0.19042 | X64 | Intel Core i7-7700 CPU 3.60GHz (Kaby Lake) |
| Same | 206708750.00 | 206860050.00 | 1.00 | Windows 10.0.19041.450 | X64 | Intel Core i7-8650U CPU 1.90GHz (Kaby Lake R) |
| Slower | 140924300.00 | 185228400.00 | 0.76 | Windows 10.0.19041.450 | X64 | Intel Core i7-8700 CPU 3.20GHz (Coffee Lake) |
| Same | 154948321.00 | 154788579.50 | 1.00 | ubuntu 18.04 | X64 | Intel Xeon CPU E5-1650 v4 3.60GHz |
| Same | 175860282.50 | 163007313.50 | 1.08 | manjaro | X64 | Intel Core i7-4771 CPU 3.50GHz (Haswell) |
| Slower | 199713880.00 | 255270486.50 | 0.78 | pop 20.04 | X64 | Intel Core i7-6600U CPU 2.60GHz (Skylake) |
| Same | 151256100.00 | 168661900.00 | 0.90 | alpine 3.11 | X64 | Intel Core i7-7700 CPU 3.60GHz (Kaby Lake) |
| Same | 171229200.00 | 165843050.00 | 1.03 | ubuntu 18.04 | X64 | Intel Core i7-7700 CPU 3.60GHz (Kaby Lake) |
| Same | 503785101.00 | 505992400.50 | 1.00 | ubuntu 16.04 | Arm64 | Unknown processor |
| Same | 503901205.00 | 506190175.00 | 1.00 | ubuntu 16.04 | Arm64 | Unknown processor |
| Same | 504131772.50 | 506220395.00 | 1.00 | ubuntu 16.04 | Arm64 | Unknown processor |
| Same | 473629200.00 | 541631800.00 | 0.87 | ubuntu 18.04 | Arm64 | Unknown processor |
| Same | 331381500.00 | 333779500.00 | 0.99 | ubuntu 20.04 | Arm64 | Unknown processor |
| Same | 246876150.00 | 247010200.00 | 1.00 | Windows 10.0.18363.959 | X86 | Intel Xeon CPU E5-1650 v4 3.60GHz |
| Same | 290036150.00 | 289409500.00 | 1.00 | Windows 10.0.19041.450 | X86 | Intel Core i7-5557U CPU 3.10GHz (Broadwell) |
| Same | 418007450.00 | 415404450.00 | 1.01 | Windows 10.0.18363.1016 | Arm | Microsoft SQ1 3.0 GHz |
| Same | 204196936.50 | 204410652.50 | 1.00 | macOS Catalina 10.15.6 | X64 | Intel Core i5-4278U CPU 2.60GHz (Haswell) |
| Same | 176763730.00 | 175647563.50 | 1.01 | macOS Catalina 10.15.6 | X64 | Intel Core i7-4870HQ CPU 2.50GHz (Haswell) |
| Same | 180812724.00 | 184849205.00 | 0.98 | macOS Mojave 10.14.5 | X64 | Intel Core i7-5557U CPU 3.10GHz (Broadwell) |

Using the tool had one major flaw: it was not automated and hence we were finding out about the regressions only when we searched for them.

This has been recognized and a new project has been started. In 2020 @DrewScoggins started implementing a GitHub bot that would be using the data gathered from performance lab (a set of machines owned by .NET Performance Team) microbenchmark runs to detect and auto-file the regressions. So far the bot was reporting new issues in a dedicated repository and once a week the workgroup led by @DrewScoggins that consisted of @AndyAyersMS, @kunalspathak, @tannergooding any myself was going through the list and triaging the issues. Issues that were seemed as actual regressions were labeled as Needs Transfer and were later moved by @DrewScoggins to the runtime repo.

A few weeks ago we were getting close to "code freeze" for .NET 5 and I have asked myself a question: are we sure that the bot has reported all possible regressions for all the supported OS versions?

The bot is using different statistical methods to detect regressions and so far it has been enabled only for Windows 10 x64, Ubuntu 18.04 x64, and Windows 10 x86. So I've decided to spend some time and use the old tool that I wrote to verify it. To increase the sample size and get other .NET Libraries Team members involved, I've simply asked the Team to run the benchmarks and share the results with me.

Running the performance repo microbenchmarks against the latest .NET Core SDK is super easy thanks to a python script implemented by @jorive. The script downloads the right SDK and starts benchmarking with cleared environment variables.

git clone https://github.com/dotnet/performance.git
python3 ./performance/scripts/benchmarks_ci.py -f netcoreapp3.1 netcoreapp5.0 --filter '*'

Data

The data I've received from the .NET Libraries Team members allowed me a big part of the entire matrix of the supported configurations:

| Operating System | Arch | Processor Name | Provided by |
| ----------------------- | ----- | --------------------------------------------- | ------------------- |
| Windows 10.0.19041.388 | X64 | AMD Ryzen 9 3900X | @tannergooding |
| Windows 10.0.18363.959 | X64 | Intel Xeon CPU E5-1650 v4 3.60GHz | @adamsitnik |
| Windows 10.0.19041.450 | X64 | Intel Core i7-5557U CPU 3.10GHz (Broadwell) | @adamsitnik |
| Windows 10.0.19041.450 | X64 | Intel Core i7-6700 CPU 3.40GHz (Skylake) | @GrabYourPitchforks |
| Windows 10.0.19042 | X64 | Intel Core i7-7700 CPU 3.60GHz (Kaby Lake) | @danmosemsft |
| Windows 10.0.19041.450 | X64 | Intel Core i7-8650U CPU 1.90GHz (Kaby Lake R) | @jeffhandley |
| Windows 10.0.19041.450 | X64 | Intel Core i7-8700 CPU 3.20GHz (Coffee Lake) | @jeffhandley |
| ubuntu 18.04 | X64 | Intel Xeon CPU E5-1650 v4 3.60GHz | @adamsitnik |
| manjaro | X64 | Intel Core i7-4771 CPU 3.50GHz (Haswell) | @ManickaP |
| pop 20.04 | X64 | Intel Core i7-6600U CPU 2.60GHz (Skylake) | @carlossanlop |
| alpine 3.11 (WSL2) | X64 | Intel Core i7-7700 CPU 3.60GHz (Kaby Lake) | @danmosemsft |
| ubuntu 18.04 (WSL2) | X64 | Intel Core i7-7700 CPU 3.60GHz (Kaby Lake) | @danmosemsft |
| ubuntu 16.04 | Arm64 | Qualcomm Centriq | @adamsitnik |
| ubuntu 18.04 (WSL2) | Arm64 | Microsoft SQ1 3.0 GHz (Surface Pro X) | @carlossanlop |
| ubuntu 20.04 (WSL2) | Arm64 | Microsoft SQ1 3.0 GHz (Surface Pro X) | @pgovind |
| Windows 10.0.18363.959 | X86 | Intel Xeon CPU E5-1650 v4 3.60GHz | @adamsitnik |
| Windows 10.0.19041.450 | X86 | Intel Core i7-5557U CPU 3.10GHz (Broadwell) | @adamsitnik |
| Windows 10.0.18363.1016 | Arm | Microsoft SQ1 3.0 GHz (Surface Pro X) | @adamsitnik |
| macOS Catalina 10.15.6 | X64 | Intel Core i5-4278U CPU 2.60GHz (Haswell) | @jeffhandley |
| macOS Catalina 10.15.6 | X64 | Intel Core i7-4870HQ CPU 2.50GHz (Haswell) | @carlossanlop |
| macOS Mojave 10.14.5 | X64 | Intel Core i7-5557U CPU 3.10GHz (Broadwell) | @adamsitnik |

Everyone interested can download the data from here. The full report generated by the tool is available here.

Moreover, the full historical data turned out to be extremely useful. I've used it every time I was not sure whether something was a regression or just unstable|multimodal benchmark:

Regressions

Already fixed

[x] System.Collections.Contains*, System.Memory.SequenceReader.TryReadTo, System.Text.Json.Tests.Perf_Segment.ReadSingleSegmentSequenceByN
- was a 32 bit issue only (both x86 and ARM)
- detected by the bot, reported in https://github.com/DrewScoggins/performance-2/issues/910#issuecomment-676413559
- confirmed: https://github.com/DrewScoggins/performance-2/issues/910#issuecomment-677854221
- transffered to runtime repo: https://github.com/dotnet/runtime/issues/41167
- fixed in https://github.com/dotnet/runtime/pull/41198
- backported to 5.0 in https://github.com/dotnet/runtime/pull/41254
[x] System.Collections.CtorGivenSize<Int32>.Array(Size: 512)
- specific to Apline only
- created an issue https://github.com/dotnet/runtime/issues/41398
- confirmed by @jkotas to be not WSL specific, but a much bigger Alpine perf problem
- it has shown that an increased number of Gen 0 collections is a valuable metric to detect regressions
- fixed in https://github.com/dotnet/runtime/pull/41532
- backported to 5.0 https://github.com/dotnet/runtime/pull/41547
- created https://github.com/dotnet/runtime/issues/41708 to add unit tests that ensure that this problem is not coming back
[x] System.Numerics.Tests.Perf_Quaternion.Conjugate and System.Numerics.Tests.Perf_Quaternion.Negat*
- not reported by the bot because it's a brand new benchmark and we did not have historical data at the time of my investigation
- issue created: https://github.com/dotnet/runtime/issues/41738
- fixed in https://github.com/dotnet/runtime/pull/41829
- backported to 5.0-rc2 in https://github.com/dotnet/runtime/pull/41885
[x] Directory.EnumerateFiles
- not reported by the bot, most probably because it was a very fresh regression
- issue created: https://github.com/dotnet/runtime/issues/41739
- fixed in https://github.com/dotnet/runtime/issues/41739
- backported to 5.0-rc2 in https://github.com/dotnet/runtime/pull/41820
[x] ByteMark.BenchIDEAEncryption
- not reported by the bot, most probably because it was a very fresh regression
- issue created: https://github.com/dotnet/runtime/issues/41677
- fixed in https://github.com/dotnet/runtime/pull/40871
- backported to 5.0-rc2 in https://github.com/dotnet/runtime/pull/41838
[x] System.Text.Perf_Utf8Encoding
- not detected by the bot because it was not enabled for ARM yet
- issue created: https://github.com/dotnet/runtime/issues/41699
- fixed in https://github.com/dotnet/runtime/pull/42052
- backported to 5.0-rc2 in #42064

Investigation in progress

[ ] System.Memory.Slice
- not detected by the bot because it was not enabled for ARM yet
- seems to be ARM64-specific, created an issue https://github.com/dotnet/runtime/issues/41704
- investigation is in progress
[ ] PerfLabTests.CastingPerf2.CastingPerf.IntObj
- not detected by the bot because it was not enabled for ARM yet
- seems to be ARM64-specific, created an issue https://github.com/dotnet/runtime/issues/41706
- investigation is in progress

By design or Acceptable

[ ] ICU-related regressions
- System.Globalization.Tests.StringSearch: detected by the bot, reported in https://github.com/dotnet/runtime/issues/37819
- System.Memory.ReadOnlySpan.IndexOfString: detected by the bot, reported in https://github.com/dotnet/runtime/issues/39724
- System.Globalization.Tests.Perf_DateTimeCultureInfo.Parse(culturestring: ja): detected by the bot, reported in https://github.com/dotnet/runtime/issues/37807
- System.Globalization.Tests.StringEquality: detected by the bot, reported in https://github.com/dotnet/runtime/issues/39038
- I've created one uber issue to track all of them in one place: https://github.com/dotnet/runtime/issues/40942
- OrdinalIgnoreCase has been optimized in https://github.com/dotnet/runtime/pull/40962
- TODO: doc update still required
[x] System.Linq.Tests.Perf_Enumerable.FirstWithPredicate_LastElementMatches(input: IOrderedEnumerable)
- detected by the bot, reported in https://github.com/dotnet/runtime/issues/39032
- closed, by design: removed the O(N log N) cost of the OrderBy https://github.com/dotnet/runtime/issues/39032#issuecomment-656678750
[x] System.Collections.Tests.Perf_BitArray.*(Size: 4)
- detected by the bot, reported in https://github.com/dotnet/runtime/issues/37813
- closed, by design: introduction of vectorization has increased the cost of operations for small inputs: https://github.com/dotnet/runtime/issues/37813#issuecomment-656370853
[x] System.Threading.Tests.Perf_Thread.GetCurrentProcessorId
- detected by the bot, reported in https://github.com/dotnet/runtime/issues/37804
- closed, by design: precision was improved at a cost of acceptable minor perf regression: https://github.com/dotnet/runtime/issues/37804#issuecomment-643448336
[x] PerfLabTests.CastingPerf.CheckIsInstAnyIsInterfaceNo, PerfLabTests.CastingPerf.CheckObjIsInterfaceNo
- detected by the bot, reported in https://github.com/dotnet/runtime/issues/37803
- closed, by design: known tradeoff: https://github.com/dotnet/runtime/issues/37803#issuecomment-670209689
[x] System.Net.NetworkInformation.Tests.PhysicalAddressTests.PAShort
- detected by the bot, reported in https://github.com/dotnet/runtime/issues/39720
- closed, acceptable for improved code reuse https://github.com/dotnet/runtime/issues/39720#issuecomment-671535578
- benchmark for 1 byte removed, added 6 bytes in https://github.com/dotnet/performance/pull/1490
[x] System.Numerics.Tests.Perf_Vector*.GetHashCodeBenchmark
- detected by the bot, reported in https://github.com/dotnet/runtime/issues/39035 and https://github.com/dotnet/runtime/issues/39029
- closed, "it should not be used" https://github.com/dotnet/runtime/issues/39029#issuecomment-656408464
[ ] System.Net.Primitives.Tests.CredentialCacheTests.ForEach(uriCount: 0, hostPortCount: 0)
- detected by the bot, reported in https://github.com/DrewScoggins/performance-2/issues/510
- confirmed: https://github.com/DrewScoggins/performance-2/issues/510#issuecomment-680783031
- awaiting the transfer to runtime repo. Most probably a by-design regression.

Moved to 6.0

[x] System.Tests.Perf_Char.GetUnicodeCategory(c: '?')
- detected and reported by the bot in https://github.com/DrewScoggins/performance-2/issues/574, I've created https://github.com/dotnet/runtime/issues/41107
- minor regression for non-ascii characters, moved to 6.0
[x] PerfLabTests.StackWalk.Walk
- detected by the bot and reported in https://github.com/dotnet/runtime/issues/39115
- confirmed in https://github.com/dotnet/runtime/issues/39115#issuecomment-677857078
- specific to everything that is not Windows x64, rather not critical -> moved to 6.0: https://github.com/dotnet/runtime/issues/39115#issuecomment-682684126
[x] System.Tests.Perf_String.Replace_Char(text: "Hello", oldChar: 'l', newChar: '!')
- reported in https://github.com/dotnet/runtime/issues/37816
- confirmed in https://github.com/dotnet/runtime/issues/37816#issuecomment-680706314
- moved to 6.0
[x] System.Text.Perf_Utf8String.IsAscii(Input: EnglishAllAscii)
- not reported by the bot because it was a brand new benchmark and we did not have historical data at the time of my investigation
- issue created: https://github.com/dotnet/runtime/issues/41388
- moved to 6.0 as Utf8String is still only experimental
[x] System.Text.Encodings.Web.Tests.Perf_Encoders.EncodeUtf8
- not reported by the bot because it was a brand new benchmark and we did not have historical data at the time of my investigation
- issue created: https://github.com/dotnet/runtime/issues/41104
- moved to 6.0

Unstable or multimodal benchmarks

There was of course more of them, here are the ones that I've noted to use as Contract Tests in the near future (to reduce the noise produced by the bot):

System.Buffers.Tests.RentReturnArrayPoolTests<Byte>.ProducerConsumer
- detected by the bot, reported in https://github.com/dotnet/runtime/issues/39031
- asked for historical data to verify if it's multimodal or not https://github.com/dotnet/runtime/issues/39031#issuecomment-680712003
- thanks to historical data provided it was possible to tell that it's unstable for x64 and bimodal for x86: https://github.com/dotnet/runtime/issues/39031#issuecomment-682207270
System.Memory.ReadOnlySequence.Slice_Repeat_StartPosition_And_EndPosition(Segment: Multiple)
- quite unstable benchmark, I've verified that 5.0 codegen is better
PerfLabTests.BlockCopyPerf.CallBlockCopy
- detected by the bot, reported in https://github.com/dotnet/runtime/issues/37808
- copying 0 elements does not add value: https://github.com/dotnet/runtime/issues/37808#issuecomment-654424436
- test case for copying 0 elements removed in https://github.com/dotnet/performance/pull/1465
- closed as unstable based on full historical data: https://github.com/dotnet/runtime/issues/37808#issuecomment-685014522
System.Tests.Perf_String.Trim_CharArr(s: "Test", c: [' ', ' '])
- multimodal benchmark, needs a rewrite as stated long time ago: https://github.com/dotnet/runtime/issues/13135
System.Threading.Tests.Perf_Interlocked.CompareExchange_long
- the benchmark typically reports 10ns, but sometimes x100 that. Only for x86. I need logs to verify whether it's a BDN bug or not.
- issue created https://github.com/dotnet/performance/issues/1497
System.Memory.Span<Int32>.IndexOfValue(Size: 512)
- reported in https://github.com/dotnet/runtime/issues/39722
- confirmed that it was due to code alignment change in https://github.com/dotnet/runtime/issues/39722#issuecomment-674999435
Benchstone.BenchI.Fib.Test
- perfectly multimodal, great example for a contract test

Summary

The bot has reported all major performance issues for the configurations that it was enabled for (Windows x64, x86, and Ubuntu x64). Great work @DrewScoggins!
The full historical data turned out to be extremely useful to exclude all false positives for multimodal and unstable benchmarks.
We have missed one important x86 bug during triaging (human error), but it got discovered during the study (https://github.com/dotnet/runtime/issues/41167#issuecomment-679285756). To avoid such problems in the future and to enable the bot in the runtime repo, the noise of the bot needs to be reduced. Currently, it's quite high, mostly due to the multimodal nature of the benchmarks.
The study has detected relatively many new ARM64 perf problems at a late stage of the release. The sooner we enable the bot for ARM64, the better. Moreover, we should be more frequently asking for ARM64 results when reviewing big changes that affect the performance of frequently used features (like sorting the arrays).
The study has shown that measuring the performance of GNU libc based Linux distros like Ubuntu is not enough to detect musl libc specific regressions. We should consider adding Alpine runs to the perf lab.
This time no important issues specific to macOS and different CPU families were discovered. It has proven that the perf lab has good hardware coverage.
The Alpine regression has shown that an increased number of Gen 0 collections can be a very valuable metric to detect regressions. We should consider extending the bot to use it.

Big thanks to everyone involved!

Discussion area-Meta donotuse_Triaged tenet-performance tenet-performance-benchmarks tracking

Source

adamsitnik

❤33 🎉14 🚀11 👀1 👍1

Most helpful comment

thought the perf lab did not cover Mac and only had one type of CPU.

You are right.

no important issues specific to macOS and different CPU families were discovered. It has proven that the perf lab has good hardware coverage.

I meant that in my study I have gathered macOS and different CPU families results but I did not find any important regressions (specific to macOS etc), so adding macOS or more types of CPUs to perf lab is not needed

Just for clarity here. We will have a brand new batch of AMD hardware this calendar year. So we will not be covering MacOS in the lab, but we will have both Intel and AMD hardware for coverage.

DrewScoggins on 10 Sep 2020

🚀2 ❤2

All 10 comments

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

Dotnet-GitSync-Bot on 4 Sep 2020

Great work @adamsitnik . I am pleased that our systems have improved since last cycle, and that you and @DrewScoggins will be using this exercise to improve them such that more issues are found earlier and not at the end of the cycle.

Cc @Lxiamail @jkotas

no important issues specific to macOS and different CPU families were discovered. It has proven that the perf lab has good hardware coverage.

I thought the perf lab did not cover Mac and only had one type of CPU. How can we catch such regression before the end of the cycle?

danmosemsft on 4 Sep 2020

👍2

thought the perf lab did not cover Mac and only had one type of CPU.

You are right.

no important issues specific to macOS and different CPU families were discovered. It has proven that the perf lab has good hardware coverage.

I meant that in my study I have gathered macOS and different CPU families results but I did not find any important regressions (specific to macOS etc), so adding macOS or more types of CPUs to perf lab is not needed

adamsitnik on 4 Sep 2020

Could an effort like this (script run over manual collected submissions) be made cheap enough that we could do it occasionally during the cycle?

danmosemsft on 4 Sep 2020

👍1

@adamsitnik Great job! Based on the data out of this exercise, we will add Alpine to .net perf lab.

Lxiamail on 4 Sep 2020

👍2

Is the collected data internal only or available publicly as well?

ladeak on 4 Sep 2020

Could an effort like this (script run over manual collected submissions) be made cheap enough that we could do it occasionally during the cycle?

We talked about this offline, but sharing the comment here too... We'll be putting effort into that between now and the first 6.0 previews with the goal of completing targeting manual runs for each of the 6.0 Preview/RC releases.

jeffhandley on 5 Sep 2020

👍1

thought the perf lab did not cover Mac and only had one type of CPU.

You are right.

no important issues specific to macOS and different CPU families were discovered. It has proven that the perf lab has good hardware coverage.

I meant that in my study I have gathered macOS and different CPU families results but I did not find any important regressions (specific to macOS etc), so adding macOS or more types of CPUs to perf lab is not needed

Just for clarity here. We will have a brand new batch of AMD hardware this calendar year. So we will not be covering MacOS in the lab, but we will have both Intel and AMD hardware for coverage.

DrewScoggins on 10 Sep 2020

🚀2 ❤2

@DrewScoggins, do we know what the hardware specs are?