Benchmarkdotnet: Improve the performance of BenchmarkDotNet

Created on 21 Sep 2017  路  8Comments  路  Source: dotnet/BenchmarkDotNet

Some of our users like @benaadams runs tons of benchmarks, which takes a lot of time (example)

I have a lot of ideas for speed-up.

It includes:

  1. Build everything in parallel first and then run each benchmark sequentially to keep the correctness
  2. Make sure that only the diagnosers that really need an extra run perform an extra run. For example: the disassembly diagnoser could just attach at the end, the memory diagnoser could get the allocated memory without any overhead for .NET Core in silent mode.
  3. Profile everything, make sure we are as fast as possible

I am busy now, but I have no conferences for November/December scheduled so I will finally have some time for bigger changes

Diagnosers Engine Toolchains enhancement

Most helpful comment

I just finished running CoreCLR benchmark suite with different versions of BenchmarkDotNet. The results:

hh:mm:ss
04:13:17 - initial time (0.10.11)
02:18:24 - improved MemoryDiagnoser + parallel builds (0.10.12)
02:08:54 - build one exe per runtime (0.10.14.514)
00:49:16 - don't execute long operations more than once per iteration #760 (0.10.14.553)
00:26:17 - have two main actions: with unroll and without, for no unroll increase the step by 1 in pilot (not *2) #771 (0.10.14.569)

which means that we went from 4 hours 13 minutes with 0.10.11 to 26 minutes 17 seconds = 9.63 times faster now ;).

As soon as #771 gets merged I am going to close this issue.

All 8 comments

Small progress: there is no need to run benchmarks one more time when Disassembly Diagnoser is used. After we are done benchmarking, the child process sends signal to parent process and waits for response. Parent is attaching to it with ClrMD and getting the disassembly. After this, it's unblocking the child process. This is done after running all benchmarks, so it does not affect accuracy in any way ;)

TLDR: For those who used DisassemblyDiagnoser the speedup is x2 ;)

Another progress: MemoryDiagnoser no longer requires extra run of the benchmark. Both for classic and Core. See #606 for more details

TLDR: For those who used MemoryDiagnoser the speedup is x2 ;)

Another improvement: the benchmarks are now built in parallel. The more benchmarks you have (especially .NET Core) the bigger gain.

Example: 4 .NET Core benchmarks (no SSD, 4 cores) => build time reduced from 40s to 14s.

Details: the build is now silent (no output to log) if something fails, the failure is printed afterwards.

@adamsitnik, awesome!

Another improvement will be #699 : Generate one executable per runtime settings

Initial improvements for #699: building the benchmarks for our Samples project (more than 650 benchmarks in total) takes 13 seconds in total on my PC.

When I had 600 benchmarks and it took 6s to build each the total was 3 600 s = 60 min = 1h.

So the more benchmarks, the bigger gain

I just finished running CoreCLR benchmark suite with different versions of BenchmarkDotNet. The results:

hh:mm:ss
04:13:17 - initial time (0.10.11)
02:18:24 - improved MemoryDiagnoser + parallel builds (0.10.12)
02:08:54 - build one exe per runtime (0.10.14.514)
00:49:16 - don't execute long operations more than once per iteration #760 (0.10.14.553)
00:26:17 - have two main actions: with unroll and without, for no unroll increase the step by 1 in pilot (not *2) #771 (0.10.14.569)

which means that we went from 4 hours 13 minutes with 0.10.11 to 26 minutes 17 seconds = 9.63 times faster now ;).

As soon as #771 gets merged I am going to close this issue.

The CI package is already available, the version number is 0.10.14.580.

Was this page helpful?
0 / 5 - 0 ratings