Benchmarkdotnet: Benchmark setup and caching of repeated operations

Created on 19 Sep 2016  路  7Comments  路  Source: dotnet/BenchmarkDotNet

I'm developing a library where an operation might be performed once or multiple times. I'd like to benchmark how long setup takes as well as how much caching might speed up subsequent operations.

For example, say I wanted to investigate the performance of compiled vs static regular expressions.

What I'd like is to be able to do the following:

``` c#
[Benchmark]
public IEnumerable Regex_Match_Compiled()
{
string value;
var regex = new Regex(" !", RegexOptions.Compiled);
for (int i = 0; i < 3; i++)
{
yield return () => value = regex.Match("Hello, World!").Groups[1].Value;
}
}

    [Benchmark]
    public IEnumerable<Action> Regex_Match_Static()
    {
        string value;
        for (int i = 0; i < 3; i++)
        {
            yield return () => value = Regex.Match("Hello, World!", "[ ](.*)!").Groups[1].Value;
        }
    }

| Method | Median | StdDev |
| --- | --- | --- |
| Regex_Match_Compiled(1) | 1,035,622.7664 ns | 21,349.3871 ns |
| Regex_Match_Compiled(2) | 10234.7148 ns | 60,138.4160 ns |
| Regex_Match_Compiled(3) | 10640.9544 ns | 21,160.7904 ns |
| Regex_Match_Static(1) | 943.3168 ns | 29.4935 ns |
| Regex_Match_Static(2) | 948.445 ns | 67.2509 ns |
| Regex_Match_Static(3) | 798.6434 ns | 19.6410 ns |

At the moment, I could do something like this:

``` c#
        [Benchmark]
        public void Regex_Match_Compiled_1()
        {
            Execute(Regex_Match_Compiled(1));
        }

        [Benchmark]
        public void Regex_Match_Compiled_2()
        {
            Execute(Regex_Match_Compiled(2));
        }

        [Benchmark]
        public void Regex_Match_Compiled_3()
        {
            Execute(Regex_Match_Compiled(3));
        }

        [Benchmark]
        public void Regex_Match_Static_1()
        {
            Execute(Regex_Match_Static(1));
        }

        [Benchmark]
        public void Regex_Match_Static_2()
        {
            Execute(Regex_Match_Static(2));
        }

        [Benchmark]
        public void Regex_Match_Static_3()
        {
            Execute(Regex_Match_Static(3));
        }

        static void Execute(IEnumerable<Action> actions)
        {
            foreach(var action in actions)
            {
                action();
            }
        }

        public IEnumerable<Action> Regex_Match_Compiled(int n)
        {
            string value;
            var regex = new Regex("[ ](.*)!", RegexOptions.Compiled);
            for (int i = 0; i < n; i++)
            {
                yield return () => value = regex.Match("Hello, World!").Groups[1].Value;
            }
        }

        public IEnumerable<Action> Regex_Match_Static(int n)
        {
            string value;
            for (int i = 0; i < n; i++)
            {
                yield return () => value = Regex.Match("Hello, World!", "[ ](.*)!").Groups[1].Value;
            }
        }

| Method | Median | StdDev |
| --- | --- | --- |
| Regex_Match_Compiled_1 | 1,035,622.7664 ns | 21,349.3871 ns |
| Regex_Match_Compiled_2 | 1,045,857.4812 ns | 60,138.4160 ns |
| Regex_Match_Compiled_3 | 1,056,498.4356 ns | 21,160.7904 ns |
| Regex_Match_Static_1 | 943.3168 ns | 29.4935 ns |
| Regex_Match_Static_2 | 1,891.7618 ns | 67.2509 ns |
| Regex_Match_Static_3 | 2,690.4052 ns | 19.6410 ns |

It's not pretty and I have to subtract the results to see how long a single operation takes.

Does that make sense or is there already some way to do this?

question

Most helpful comment

I think my confusion has been because [Setup] doesn't behave as someone who is used to NUnit would expect. It isn't called before each [Benchmark] method, but is a one-time-setup that is called even before the warm-up iterations. This means that any caching done by the [Benchmark] methods is lost during warm-up.

@jcansdale I have just added extra explanation to Setup and Cleanup attribute summary to avoid confusion in the future. Thanks for the input!

All 7 comments

You can move initialization for the compiled regex to a setup method and mark one of benchmarks as a baseline. An example:

public class RegexStaticBenchmarks
{
    [Benchmark(Baseline = true)] public string MatchStatic1() => MatchStatic(1);
    [Benchmark] public string MatchStatic2() => MatchStatic(2);
    [Benchmark] public string MatchStatic3() => MatchStatic(3);

    public static string MatchStatic(int n)
    {
        var res = "";
        for (int i = 0; i < n; i++)
            res = Regex.Match("Hello, World!", "[ ](.*)!").Groups[1].Value;
        return res;
    }
}

public class RegexCompiledBenchmarks
{
    private Regex regex;

    [Setup]
    public void Setup() => regex = new Regex("[ ](.*)!", RegexOptions.Compiled);

    [Benchmark]public Regex Initialization() => new Regex("[ ](.*)!", RegexOptions.Compiled);
    [Benchmark(Baseline = true)] public string MatchCompiled1() => MatchCompiled(1);
    [Benchmark] public string MatchCompiled2() => MatchCompiled(2);
    [Benchmark] public string MatchCompiled3() => MatchCompiled(3);

    public string MatchCompiled(int n)
    {
        string res = "";
        for (int i = 0; i < n; i++)
            res = regex.Match("Hello, World!").Groups[1].Value;
        return res;
    }
}

RegexStaticBenchmarks:

| Method | Median | StdDev | Scaled | Scaled-SD |
| --- | --: | --: | --: | --: |
| MatchStatic1 | 745.5216 ns | 17.6792 ns | 1.00 | 0.00 |
| MatchStatic2 | 1,488.7698 ns | 48.1934 ns | 1.98 | 0.08 |
| MatchStatic3 | 2,234.1599 ns | 68.8505 ns | 3.01 | 0.11 |

RegexCompiledBenchmarks:

| Method | Median | StdDev | Scaled | Scaled-SD |
| --- | --: | --: | --: | --: |
| Initialization | 48,913.5810 ns | 622.2665 ns | 152.04 | 3.32 |
| MatchCompiled1 | 322.3215 ns | 5.9223 ns | 1.00 | 0.00 |
| MatchCompiled2 | 665.1863 ns | 9.7146 ns | 2.07 | 0.05 |
| MatchCompiled3 | 1,007.8436 ns | 36.0203 ns | 3.14 | 0.12 |

As you can see, MatchXXX-N takes N times more time than MatchXXX-1 (it seems logical). So, I don't completely understand why do you need several methods with n=1,2,3. Probably, it makes sense to write the following benchmarks:

public class RegexBenchmarks
{
  private const string Input = "Hello, World!";
  private const string Pattern = "[ ](.*)!";
  private Regex compiledRegex;
  [Setup] public void Setup() => compiledRegex = new Regex(Pattern, RegexOptions.Compiled);

  [Benchmark] public Regex Compilation() => new Regex(Pattern, RegexOptions.Compiled);
  [Benchmark] public string MatchCompiled() => compiledRegex.Match(Input).Groups[1].Value;
  [Benchmark] public string MatchStatic() => Regex.Match(Input, Pattern).Groups[1].Value;
}

| Method | Median | StdDev |
| --- | --: | --: |
| Compilation | 48,636.0659 ns | 506.5380 ns |
| MatchCompiled | 323.0128 ns | 5.4955 ns |
| MatchStatic | 751.6203 ns | 20.6820 ns |

Thanks for looking into this. I've been having a play with using [Setup], but have been struggling to get the results I'm looking for.

I think my confusion has been because [Setup] doesn't behave as someone who is used to NUnit would expect. It isn't called before each [Benchmark] method, but is a one-time-setup that is called even before the warm-up iterations. This means that any caching done by the [Benchmark] methods is lost during warm-up.

If I want to warm-up the code paths, but not any caching the target methods might be doing, I think I'll need to include _everything_ in the [Benchmark] methods.

Even though the results look believable, I think all of you examples miss the most most significant performance hit when using compiled Regex. I've added a CompilationAndMatch benchmark to your last example.

``` c#
public class RegexBenchmarks
{
private const string Input = "Hello, World!";
private const string Pattern = " !";
private Regex compiledRegex;
[Setup]
public void Setup() => compiledRegex = new Regex(Pattern, RegexOptions.Compiled);

    [Benchmark]
    public Regex Compilation() => new Regex(Pattern, RegexOptions.Compiled);
    [Benchmark]
    public string MatchCompiled() => compiledRegex.Match(Input).Groups[1].Value;
    [Benchmark(Baseline = true)]
    public string MatchStatic() => Regex.Match(Input, Pattern).Groups[1].Value;
    [Benchmark]
    public string CompilationAndMatch() => Compilation().Match(Input).Groups[1].Value;
}

```

| Method | Median | StdDev | Scaled | Scaled-SD |
| --- | --- | --- | --- | --- |
| Compilation | 53,004.5134 ns | 557.2312 ns | 63.00 | 1.45 |
| MatchCompiled | 345.5289 ns | 3.2687 ns | 0.41 | 0.01 |
| MatchStatic | 842.1217 ns | 21.9378 ns | 1.00 | 0.00 |
| CompilationAndMatch | 1,037,372.4229 ns | 7,051.5873 ns | 1,235.68 | 27.34 |

I wonder if the Regex is only being compiled the first time Match is called. Does my suggestion make more sense in this context?

It took me a while to figure out what was going on here. I suspect this is a trap that others will fall into as well! It would have been more obvious what was going on if [Setup] was called [OneTimeSetup].

Would it be feasible to add a [BenchmarkSetup] that would be executed before each [Benchmark]? It would behave in a similar way to NUnit's [SetUp] and MSTest's [TestInitialize]. This would make the examples above behave as expected.

CompilationAndMatch takes a lot of time because it allocates objects per each allocation. If you turn on BenchmarkDotNet.Diagnostics.Windows, you will see a lot of garbage collections in this benchmark.

If you want to measure compilation time plus a single match invocation, you should benchmark them separately and sum the results. It's very important to isolate such piece of code when you are trying to design a good benchmark. Also, when you are using compiled regex, you typically compile it once and then reuse results. It doesn't make sense to compile a new regex per each match invocation: it take much more time because of compilation and memory traffic. Also you should understand that it's hard to measure only compilation individually because it's also produce allocations.

If I want to warm-up the code paths, but not any caching the target methods might be doing,

Any good benchmark have a steady state. If it doesn't, you will have a lot of troubles during benchmarking. If you want just to collect a set of measurements (and look at the performance changes between iterations), you probably should use ColdStart and disable warmup.

Would it be feasible to add a [BenchmarkSetup] that would be executed before each [Benchmark]?

It's methodologically wrong, you can't good results if you will interop target iteration with other logic. You have only two ways here:

  • Try to separate your logic and measure each piece of code individually.
  • Measure all the pieces at the same time in a single benchmark method.

If you want to understand performance profile of a method without s steady state, it's a hard problem and it requests a lot of time for research.


Also, you should clearly understand what exactly do you want to measure and why do you want it. You have to clearly formulate a problem which you want to solve. For example: is it reasonable to compile regex if we are going to use it N times. If it's your problem, you can start with the following benchmark (of course, there is a lot of work here, but it can be a first approximation of the future benchmarks):

public class RegexBenchmarks
{
    private const string Input = "Hello, World!";
    private const string Pattern = "[ ](.*)!";

    [Params(1, 10, 100, 1000, 10000)]
    public int N { get; set; }

    [Benchmark(Baseline = true)]
    public string MatchCompiled()
    {
        var regex = new Regex(Pattern, RegexOptions.Compiled);
        string res = "";
        for (int i = 0; i < N; i++)
            res = regex.Match(Input).Groups[1].Value;
        return res;
    }

    [Benchmark]
    public string MatchStatic()
    {
        string res = "";
        for (int i = 0; i < N; i++)
            res = Regex.Match(Input, Pattern).Groups[1].Value;
        return res;
    }
}

An example of results:

| Method | N | Median | StdDev | Scaled | Scaled-SD |
| --- | --: | --: | --: | --: | --: |
| MatchCompiled | 1 | 957,771.8155 ns | 16,330.7626 ns | 1.00 | 0.00 |
| MatchStatic | 1 | 918.9901 ns | 90.3230 ns | 0.00 | 0.00 |
| MatchCompiled | 10 | 966,846.4167 ns | 62,114.4948 ns | 1.00 | 0.00 |
| MatchStatic | 10 | 8,699.9858 ns | 434.7926 ns | 0.01 | 0.00 |
| MatchCompiled | 100 | 990,664.1698 ns | 22,190.0849 ns | 1.00 | 0.00 |
| MatchStatic | 100 | 83,075.1658 ns | 1,874.0955 ns | 0.08 | 0.00 |
| MatchCompiled | 1000 | 1,316,740.2252 ns | 16,447.7590 ns | 1.00 | 0.00 |
| MatchStatic | 1000 | 821,359.3913 ns | 19,184.8185 ns | 0.63 | 0.02 |
| MatchCompiled | 10000 | 4,566,998.3342 ns | 96,899.5131 ns | 1.00 | 0.00 |
| MatchStatic | 10000 | 8,206,938.5377 ns | 190,892.1844 ns | 1.80 | 0.05 |

Thanks for the example. [Params(...)] looks like a nice feature. :smile:

CompilationAndMatch takes a lot of time because it allocates objects per each allocation. If you turn on BenchmarkDotNet.Diagnostics.Windows, you will see a lot of garbage collections in this benchmark.

Are you sure it isn't doing anything funky the first time Match is called? If I add N=0 to your example, I get the following. MatchCompiled where N=0 seems different to all the rest.

| Method | N | Median | StdDev | Scaled | Scaled-SD |
| --- | --- | --- | --- | --- | --- |
| MatchCompiled | 0 | 146,685.1900 ns | 0.0000 ns | 1.00 | 0.00 |
| MatchStatic | 0 | 427.6500 ns | 0.0000 ns | 0.00 | 0.00 |
| MatchCompiled | 1 | 1,040,481.2600 ns | 0.0000 ns | 1.00 | 0.00 |
| MatchStatic | 1 | 7,270.1100 ns | 0.0000 ns | 0.01 | 0.00 |
| MatchCompiled | 10 | 1,033,638.8100 ns | 0.0000 ns | 1.00 | 0.00 |
| MatchStatic | 10 | 17,961.4500 ns | 0.0000 ns | 0.02 | 0.00 |
| MatchCompiled | 100 | 1,102,491.0400 ns | 0.0000 ns | 1.00 | 0.00 |
| MatchStatic | 100 | 101,781.5600 ns | 0.0000 ns | 0.09 | 0.00 |
| MatchCompiled | 1000 | 1,461,292.4300 ns | 0.0000 ns | 1.00 | 0.00 |
| MatchStatic | 1000 | 818,529.0300 ns | 0.0000 ns | 0.56 | 0.00 |
| MatchCompiled | 10000 | 4,468,125.0500 ns | 0.0000 ns | 1.00 | 0.00 |
| MatchStatic | 10000 | 8,255,425.5300 ns | 0.0000 ns | 1.85 | 0.00 |

Hmm, now I get it! It looks very interesting. I could suggest you to open the source code of the Regex and try to understand what's going on there.

It looks like the regex is parsed and the source gets prepared when the Regex is constructed. When Match gets called it looks in a cache for a compiled assembly (keyed off the source + parameters). Compilation only happens if it can't find anything in the cache. That's what I think is going on - the Regex source is pretty hairy. 馃槈

I think my confusion has been because [Setup] doesn't behave as someone who is used to NUnit would expect. It isn't called before each [Benchmark] method, but is a one-time-setup that is called even before the warm-up iterations. This means that any caching done by the [Benchmark] methods is lost during warm-up.

@jcansdale I have just added extra explanation to Setup and Cleanup attribute summary to avoid confusion in the future. Thanks for the input!

Was this page helpful?
0 / 5 - 0 ratings