Benchmarkdotnet: Feature proposal: full-featured inprocess toolchain

Created on 20 Oct 2018  路  8Comments  路  Source: dotnet/BenchmarkDotNet

Hi all.

Current implementation of InProcessToolchain is limited a bit. It does not support args, ref structs and so on.

For my pet project I've created a replacement for it, InProcessEmitToolchain. This one provides almost same MSIL(*) the default BDN toolchain does (there is a il diff prooftest for it). The difference is, InProcessEmitToolchain uses Reflection.Emit to produce runnable assembly and runs benchmarks in-process. This is much faster, even for large benchmarks build step takes 5 ms vs 2000ms for RoslynToolchain.

(*)About "almost same" IL output. There are a few shortcuts I've made to reduce amount of code to emit. My implementation uses existing methods for prepare steps (creating the job, setting arg and param values and so on).
Also there are two moments caused by Reflection.Emit limitations.

  • First, IlBuilder emits nop instructions around jumps, c# compiler does not.
  • Second, there are places (for loops mostly) where c# compiler inserts short jump instructions when possible. IjnProcessEmitToolchain uses long jumps for those.

Beside this the emitted code matches output of c# compiler in release mode op-by-op.

The question is: are you interested in merging a new toolchain in BDN?

If yes, I'll do WIP pull request and will ask for rewiew.

Toolchains enhancement

Most helpful comment

These extra NOPs are not coming from jump opcodes, but from the size of the argument you pass to Emit(OpCode, ...). You gave it int(11), which fits 4 bytes, but values are encoded in little endian, so the lowest byte comes first. Luckily for you, opcode 00 is nop. Take care to cast your branches properly if those nops are really a concern to you - they shouldn't be anyways, as I'm pretty sure the JITter doesn't care (I don't know if the JITter for emitted code is the same than the regular one).

The Emit(....) overloads ~do not do any sanity checks for you~ (edit: not exactly true, but true enough in this scenario) - the only checks are done at baketime to make sure shorthand branches are within sbyte value range, AFAIK. As a rule of thumb, always cast your arguments explicitely. If you don't want to, refrain from using the shorthand opcodes, as they are really just a way to reduce the size of the IL code rather than an actual optimization.

All 8 comments

Hello @ig-sinicyn it's great to see you contributing to BenchmarkDotNet again!

This is really great news. I know many users InProcessEmitToolchain who asked me for this, including @stephentoub yesterday ;)

Should we just replace the existing InProcessToolchain with InProcessEmitToolchain?

Second, there are places (for loops mostly) where c# compiler inserts short jump instructions when possible. IjnProcessEmitToolchain uses long jumps for those.

I assume that this is to keep the code generation simpler? if the overhead and actual workload code uses same instructions then it's perfectly fine!

First, IlBuilder emits nop instructions around jumps, c# compiler does not.

why do you need the nops?

You could reuse our existing ArgumentsTests and run them for both toolchains the way we do it for MemoryDiagnoserTests

https://github.com/dotnet/BenchmarkDotNet/blob/90161654725efd5639e0190638a3383d6a49e34c/tests/BenchmarkDotNet.IntegrationTests/MemoryDiagnoserTests.cs#L33-L39

(convering them from Facts to Theories running for current and inprocess toolchain)

Should we just replace the existing InProcessToolchain with InProcessEmitToolchain?

I plan to make InProcessEmitToolchain the default toolchain for in-process benchmarks
and keep old InProcessToolchain (only part with BenchmarkActionCodegen=DelegateCombine) for platforms that do not support Reflection.Emit (UAP or any AOT platforms).

If there's no support for no-emit platforms, the old InProcessToolchain can be dropped entirely.

I assume that this is to keep the code generation simpler?

Yep.

why do you need the nops?

I have no control over that.
The

            ilBuilder.Emit(OpCodes.Ldarg_0);
            ilBuilder.Emit(OpCodes.Ldfld, notElevenField);
            ilBuilder.Emit(OpCodes.Ldc_I4_S, 11);
            ilBuilder.Emit(OpCodes.Bne_Un, notElevenLabel);

emits something like

    IL_0000: ldarg.0
    IL_0001: ldfld int32 BenchmarkDotNet.Autogenerated.Runnable_1::NotEleven
    IL_0006: ldc.i4.s 11
    IL_0008: nop                   // <---
    IL_0009: nop
    IL_000a: nop
    IL_000b: bne.un IL_0017

You could reuse our existing ArgumentsTests and run them for both toolchains the way we do it for MemoryDiagnoserTests

Wow, will do:)
I've just found subltle bug with MemoryDiagnoser (looks like I'm doing unintended allocations somewhere), besides this the PR is ready for rewiev. Will fix errors and push it tomorrow

Okay, I've just found the reason of memory leaks with in-process toolchain, It was the xUnit output listener.
This one works fine.

        private IConfig CreateConfig(IToolchain toolchain)
            => ManualConfig.CreateEmpty()
                .With(Job.ShortRun
                    .WithEvaluateOverhead(false) // no need to run idle for this test
                    .WithWarmupCount(0) // don't run warmup to save some time for our CI runs
                    .WithIterationCount(1) // single iteration is enough for us
                    .WithGcForce(false).With(toolchain))
                .With(DefaultConfig.Instance.GetLoggers().ToArray())
                .With(DefaultColumnProviders.Instance)
                .With(MemoryDiagnoser.Default)
                //.With(new OutputLogger(output)); // <-- !!!
                .With(new ConsoleLogger());

P.S> AFK until evening, sorry.

These extra NOPs are not coming from jump opcodes, but from the size of the argument you pass to Emit(OpCode, ...). You gave it int(11), which fits 4 bytes, but values are encoded in little endian, so the lowest byte comes first. Luckily for you, opcode 00 is nop. Take care to cast your branches properly if those nops are really a concern to you - they shouldn't be anyways, as I'm pretty sure the JITter doesn't care (I don't know if the JITter for emitted code is the same than the regular one).

The Emit(....) overloads ~do not do any sanity checks for you~ (edit: not exactly true, but true enough in this scenario) - the only checks are done at baketime to make sure shorthand branches are within sbyte value range, AFAIK. As a rule of thumb, always cast your arguments explicitely. If you don't want to, refrain from using the shorthand opcodes, as they are really just a way to reduce the size of the IL code rather than an actual optimization.

@Warpten

UPD Dummit!

Just checked it and I'm totally wrong. Sorry, I'm not sure why I did so silly mistake, but I really managed to convince myself the nops are emitted by Emit(..., label) overload.
Sorry again, will recheck that all constant loads do use proper values.

Prev and wrong:

These extra NOPs are not coming from jump opcodes, but from the size of the argument

nope. Nops are used as placeholders for possible fixups. See comment at
https://referencesource.microsoft.com/#mscorlib/system/reflection/emit/ilgenerator.cs,791
for more details.

P.S. Fixups are performed by BakeByteArray()

Fixed by #921. Once again big thanks @ig-sinicyn ! You have done an amazing job!

Excellent. Thanks! When do we expect this version to be released?

Excellent. Thanks! When do we expect this version to be released?

@AndreyAkinshin is planning to release it this week

Was this page helpful?
0 / 5 - 0 ratings