Current codebase, benchmark:
public class Program
{
private const int Count = 10 * 1000;
static void Main(string[] args)
{
BenchmarkRunner.Run<Program>();
}
[Benchmark(Baseline = true)]
public void Baseline() => Thread.SpinWait(Count);
[Benchmark]
public void SlowerX3() => Thread.SpinWait(3 * Count);
}
part of output:
MainWarmup 1: 2048 op, 283982469.15 ns, 138.6633 us/op
MainTarget 1: 2048 op, 283552921.72 ns, 138.4536 us/op
MainTarget 2: 2048 op, 285226436.09 ns, 139.2707 us/op
MainTarget 3: 2048 op, 282084974.32 ns, 137.7368 us/op
MainTarget 4: 2048 op, 284826169.13 ns, 139.0753 us/op
MainTarget 5: 2048 op, 287191547.67 ns, 140.2302 us/op
MainTarget 6: 2048 op, 283267663.59 ns, 138.3143 us/op
MainTarget 7: 2048 op, 283271285.92 ns, 138.3161 us/op
MainTarget 8: 2048 op, 287422772.78 ns, 140.3432 us/op
MainTarget 9: 2048 op, 286220160.72 ns, 139.7559 us/op
MainTarget 10: 2048 op, 285216172.84 ns, 139.2657 us/op
MainTarget 11: 2048 op, 289273479.23 ns, 141.2468 us/op
MainTarget 12: 2048 op, 282808533.84 ns, 138.0901 us/op
MainTarget 13: 2048 op, 288167764.38 ns, 140.7069 us/op
MainTarget 14: 2048 op, 286347545.83 ns, 139.8181 us/op
MainTarget 15: 2048 op, 284229391.01 ns, 138.7839 us/op
MainTarget 16: 2048 op, 286027573.75 ns, 139.6619 us/op
MainTarget 17: 2048 op, 285252697.95 ns, 139.2835 us/op
...
Result 1: 2048 op, 0 ns, 0.0000 ns/op
Result 2: 2048 op, 0 ns, 0.0000 ns/op
Result 3: 2048 op, 0 ns, 0.0000 ns/op
Result 4: 2048 op, 0 ns, 0.0000 ns/op
Result 5: 2048 op, 0 ns, 0.0000 ns/op
Result 6: 2048 op, 0 ns, 0.0000 ns/op
Result 7: 2048 op, 0 ns, 0.0000 ns/op
Result 8: 2048 op, 0 ns, 0.0000 ns/op
Result 9: 2048 op, 0 ns, 0.0000 ns/op
Result 10: 2048 op, 0 ns, 0.0000 ns/op
Result 11: 2048 op, 0 ns, 0.0000 ns/op
Result 12: 2048 op, 0 ns, 0.0000 ns/op
Result 13: 2048 op, 0 ns, 0.0000 ns/op
Result 14: 2048 op, 0 ns, 0.0000 ns/op
Result 15: 2048 op, 0 ns, 0.0000 ns/op
Result 16: 2048 op, 65685984.36 ns, 32.0732 us/op
Result 17: 2048 op, 66107079.7 ns, 32.2788 us/op
Result 18: 2048 op, 66544173.64 ns, 32.4923 us/op
Result 19: 2048 op, 66640768.99 ns, 32.5394 us/op
Result 20: 2048 op, 66662804.8 ns, 32.5502 us/op
Result 21: 2048 op, 66793208.52 ns, 32.6139 us/op
Result 22: 2048 op, 66925423.4 ns, 32.6784 us/op
Looks like there's something wrong with the result calculation (first 16 samples as Unroll factor == 16),
And therefore:
// BaselineScaledColumn.cs
bool invalidResults = baseline == null ||
summary[baseline] == null ||
summary[baseline].ResultStatistics == null ||
!summary[baseline].ResultStatistics.CanBeInverted() ||
summary[benchmark] == null ||
summary[benchmark].ResultStatistics == null;
// ResultStatistics.CanBeInverted()
public bool CanBeInverted() => Min > 1e-9;
Min == 0, CanBeInverted() returns true and the output is:
Method | Mean | StdErr | StdDev | Scaled | Scaled-StdDev |
--------- |----------- |---------- |----------- |------- |-------------- |
Baseline | 8.4592 us | 0.6132 us | 4.8286 us | ? | ? |
SlowerX3 | 25.4619 us | 1.8444 us | 14.5229 us | ? | ? |
I'm seeing the same behavior as well sometimes (with manual build from current master).
Whenever this happens, the remaining printed us/op results are also often increasing (as they are in the OP excerpt)
@adamsitnik, it was your bug. Recently you moved List<Measurement> in EngineWarmupStage and EngineTargetStage from a local variable to a field. Unfortunately, we share these stages between idle and main iterations. As a result, both idle and main lists have references to the same instance of List<Measurement> which has all the measurements. It spoils calculation of the results. I have revered this fix (https://github.com/PerfDotNet/BenchmarkDotNet/commit/8647e737b3473528e96a22f83691d72023bf9daf). You have to find another way to reduce the amount of allocation here.
@AndreyAkinshin Sorry for that! I will find other way.
@adamsitnik, and another bug: you sorted the values in the MainStage. It's wrong because it produces sorted Result measurements (we need the original order in the summary). I pushed another hotfix; it looks ugly and produces another allocation, but it works properly. Please note it when you will refactor these stages in the next time.
@AndreyAkinshin Ok, I have synchronized the gc branch, and found a way to solve this problem:
@adamsitnik, I like the idea, but could you make the source code more pretty? Like create an additional generic class (e.g. ListPool<T>) with nice API? I want to keep the stage classes as simple as possible and hide all "not-about-measurements" code somewhere else.
@AndreyAkinshin done
Most helpful comment
@AndreyAkinshin done