Benchmarkdotnet: Is there something wrong with this result?

Created on 25 May 2019 · 13Comments · Source: dotnet/BenchmarkDotNet

I tried to do a few tests in this way（using console tools test different version）, but each time I got a different ranking.

```C#

DateTime result = instance.CreateTime;

First:
``` ini
// * Summary *

BenchmarkDotNet=v0.11.5, OS=Windows 10.0.17134.765 (1803/April2018Update/Redstone4)
Intel Core i7-6700HQ CPU 2.60GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
Frequency=2531254 Hz, Resolution=395.0611 ns, Timer=TSC
.NET Core SDK=2.2.204
  [Host]     : .NET Core 2.0.9 (CoreCLR 4.6.26614.01, CoreFX 4.6.26614.01), 64bit RyuJIT
  Job-RRJNSC : .NET Core 2.0.9 (CoreCLR 4.6.26614.01, CoreFX 4.6.26614.01), 64bit RyuJIT
  Job-MDBOLW : .NET Core 2.1.11 (CoreCLR 4.6.27617.04, CoreFX 4.6.27617.02), 64bit RyuJIT
  Job-XBBZHA : .NET Core 2.2.5 (CoreCLR 4.6.27617.05, CoreFX 4.6.27618.01), 64bit RyuJIT

Runtime=Core  Categories=Read,Time

| Method | Toolchain | Mean | Error | StdDev | Median | Min | Max | Ratio | RatioSD | Rank | Gen 0 | Gen 1 | Gen 2 | Allocated |
|------- |-------------- |----------:|----------:|----------:|----------:|----------:|----------:|------:|--------:|-----:|------:|------:|------:|----------:|
| Origin | netcoreapp2.0 | 0.0008 ns | 0.0028 ns | 0.0026 ns | 0.0000 ns | 0.0000 ns | 0.0100 ns | ? | ? | 1 | - | - | - | - |
| Origin | netcoreapp2.1 | 0.0030 ns | 0.0095 ns | 0.0089 ns | 0.0000 ns | 0.0000 ns | 0.0337 ns | ? | ? | 1 | - | - | - | - |
| Origin | netcoreapp2.2 | 0.3825 ns | 0.0079 ns | 0.0070 ns | 0.3833 ns | 0.3703 ns | 0.3924 ns | ? | ? | 2 | - | - | - | - |

// * Warnings *
ZeroMeasurement
  DynamicCallFieldTest.Origin: Runtime=Core, Toolchain=netcoreapp2.0 -> The method duration is indistinguishable from the empty method duration
  DynamicCallFieldTest.Origin: Runtime=Core, Toolchain=netcoreapp2.1 -> The method duration is indistinguishable from the empty method duration

// * Hints *
Outliers
  DynamicCallFieldTest.Origin: Runtime=Core, Toolchain=netcoreapp2.2 -> 1 outlier  was  removed (2.45 ns)

Second:

// * Summary *

BenchmarkDotNet=v0.11.5, OS=Windows 10.0.17134.765 (1803/April2018Update/Redstone4)
Intel Core i7-6700HQ CPU 2.60GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
Frequency=2531254 Hz, Resolution=395.0611 ns, Timer=TSC
.NET Core SDK=2.2.204
  [Host]     : .NET Core 2.0.9 (CoreCLR 4.6.26614.01, CoreFX 4.6.26614.01), 64bit RyuJIT
  Job-SDDWOR : .NET Core 2.2.5 (CoreCLR 4.6.27617.05, CoreFX 4.6.27618.01), 64bit RyuJIT
  Job-OXUJBP : .NET Core 2.0.9 (CoreCLR 4.6.26614.01, CoreFX 4.6.26614.01), 64bit RyuJIT
  Job-LEIQTS : .NET Core 2.1.11 (CoreCLR 4.6.27617.04, CoreFX 4.6.27617.02), 64bit RyuJIT

Runtime=Core  Categories=Read,Time

| Method | Toolchain | Mean | Error | StdDev | Min | Max | Median | Ratio | RatioSD | Rank | Gen 0 | Gen 1 | Gen 2 | Allocated |
|------- |-------------- |----------:|----------:|----------:|----------:|----------:|----------:|------:|--------:|-----:|------:|------:|------:|----------:|
| Origin | netcoreapp2.2 | 0.0150 ns | 0.0066 ns | 0.0194 ns | 0.0000 ns | 0.1049 ns | 0.0127 ns | ? | ? | 1 | - | - | - | - |
| Origin | netcoreapp2.0 | 0.2881 ns | 0.0439 ns | 0.0856 ns | 0.0000 ns | 0.3514 ns | 0.3128 ns | ? | ? | 2 | - | - | - | - |
| Origin | netcoreapp2.1 | 0.3031 ns | 0.0188 ns | 0.0167 ns | 0.2814 ns | 0.3426 ns | 0.2975 ns | ? | ? | 3 | - | - | - | - |

// * Warnings *
MultimodalDistribution
  DynamicCallFieldTest.Origin: Runtime=Core, Toolchain=netcoreapp2.2 -> It seems that the distribution is bimodal (mValue = 3.35)
ZeroMeasurement
  DynamicCallFieldTest.Origin: Runtime=Core, Toolchain=netcoreapp2.2 -> The method duration is indistinguishable from the empty method duration

// * Hints *
Outliers
  DynamicCallFieldTest.Origin: Runtime=Core, Toolchain=netcoreapp2.0 -> 4 outliers were removed, 9 outliers were detected (1.98 ns..2.21 ns, 2.36 ns..2.40 ns)
  DynamicCallFieldTest.Origin: Runtime=Core, Toolchain=netcoreapp2.1 -> 1 outlier  was  removed (2.47 ns)

Is there something wrong with this result?

question

Source

NMSAzulX

All 13 comments

@NMSAzulX thanks for the question! It seems that the .NET Core 2.0 version takes 0 CPU cycles in the first case and 1 CPU cycle in the second case. It's a pretty interesting situation. Could you please share the source code of your benchmark and the full BenchmarkDotNet output (you can find .log file in the BenchmarkDotNet.Artifacts folder) for runs with different rank values.

AndreyAkinshin on 25 May 2019

@AndreyAkinshin Thank you very much. Here is the Source Project and Log Folder

Content just like this:
```C#
Example example = new Example();
var result = example.Time;

Commond:

dotnet run -c Release -f netcoreapp2.2 --runtimes netcoreapp2.0 netcoreapp2.1 netcoreapp2.2 --filter "DynamicCallFieldTest"
```

NMSAzulX on 25 May 2019

@NMSAzulX thanks! It seems that this benchmark should always take 0 CPU cycles with all .NET Core versions. However, sometimes we get 1 CPU cycle error (about 0.3ns). Typically, in such cases, we have problems with loop alignment (the actual performance depends on the first line address of the main benchmark loop). We are trying to resolve such problems with the help of UnrollFactor which is 16 by default. My first hypothesis: it's not enough in your case. Could you please rerun the benchmark with --unrollFactor 64 command line switch and show the new logs?

AndreyAkinshin on 25 May 2019

🎉1

I have just done three tests and the report is as follows:
1 Test
2 Test
3 Test

NMSAzulX on 25 May 2019

Is the code too simple to take less than 0.0001 (ns)? : ）

NMSAzulX on 25 May 2019

@AndreyAkinshin Hello, I've uploaded the new test log.

NMSAzulX on 26 May 2019

I have just done three tests and the report is as follows:

It seems that the problem is resolved now. Probably, we should increase the default value of the unroll factor. @adamsitnik what do you think?

Is the code too simple to take less than 0.0001 (ns)? : ）

Under the summary table, you have the following warnings:

// * Warnings *
ZeroMeasurement
  DynamicCallFieldTest.Origin: Runtime=Core, Toolchain=netcoreapp2.2, UnrollFactor=64 -> The method duration is indistinguishable from the empty method duration
  DynamicCallFieldTest.Origin: Runtime=Core, Toolchain=netcoreapp2.0, UnrollFactor=64 -> The method duration is indistinguishable from the empty method duration
  DynamicCallFieldTest.Origin: Runtime=Core, Toolchain=netcoreapp2.1, UnrollFactor=64 -> The method duration is indistinguishable from the empty method duration

The "The method duration is indistinguishable from the empty method duration" means that your method has the same duration as the following empty method (in the context of benchmarking when we call your method many times in a loop):

public int Empty() => 0;

AndreyAkinshin on 26 May 2019

@AndreyAkinshin Hello~

However, the result is not very stable, the last three test rankings fluctuated a little bit.
Test2:

Test 3：

Can some kind of constraint be used to avoid this problem?
For example:
C# for (int i = 0; i < length; i++) { var result = Empty(); }

I used this way for a new round of testing.

NewTest1 、 NewTest2 、 NewTest3

The result was unexpected, but it was much more stable.
With this result, I am confused about the performance of the. Net CORE2.2.

That's amazing.

NMSAzulX on 26 May 2019

However, the result is not very stable, the last three test rankings fluctuated a little bit.

It's just a random noise which always presents in measurements. If you look at the Rank column, you should see 1 for each toolchain. It means that there is no statistically significant difference between results (BenchmarkDotNet performs an honest statistical test)

for (int i = 0; i < length; i++)

Typically, you shouldn't write own loop because BenchmarkDotNet will generate it for you internally.

AndreyAkinshin on 26 May 2019

👍1

@AndreyAkinshin I used to stare at “Mean”. Thanks. Got it.
By the way, is there any documentation on the official website that tells us how to analyze these results?

NMSAzulX on 26 May 2019

I'm working on such a guide but it's not easy to briefly cover all the important cases. I really want to create a good cheat sheet for BenchmarkDotNet output and include it in the documentation, but it's not ready yet.
By the way, you can find a lot of useful information in my book Pro .NET Benchmarking (It's already available for pre-order; it should be released this summer).

AndreyAkinshin on 26 May 2019

Okay, thank you very much. I already have the motivation to learn.

NMSAzulX on 26 May 2019

we should increase the default value of the unroll factor. @adamsitnik what do you think?

You are most probably right. We have hit https://github.com/dotnet/performance/pull/511 after update to latest BDN and from the few runs with --unrollFactor 32 I can tell that it makes all the nano-benchmarks much more stable. I will try to find some time and run all the benchmarks from the performance repo and see the difference and let you know.