Hi all.
Current implementation of InProcessToolchain is limited a bit. It does not support args, ref structs and so on.
For my pet project I've created a replacement for it, InProcessEmitToolchain. This one provides almost same MSIL(*) the default BDN toolchain does (there is a il diff prooftest for it). The difference is, InProcessEmitToolchain uses Reflection.Emit to produce runnable assembly and runs benchmarks in-process. This is much faster, even for large benchmarks build step takes 5 ms vs 2000ms for RoslynToolchain.
(*)About "almost same" IL output. There are a few shortcuts I've made to reduce amount of code to emit. My implementation uses existing methods for prepare steps (creating the job, setting arg and param values and so on).
Also there are two moments caused by Reflection.Emit limitations.
Beside this the emitted code matches output of c# compiler in release mode op-by-op.
The question is: are you interested in merging a new toolchain in BDN?
If yes, I'll do WIP pull request and will ask for rewiew.
Hello @ig-sinicyn it's great to see you contributing to BenchmarkDotNet again!
This is really great news. I know many users InProcessEmitToolchain who asked me for this, including @stephentoub yesterday ;)
Should we just replace the existing InProcessToolchain with InProcessEmitToolchain?
Second, there are places (for loops mostly) where c# compiler inserts short jump instructions when possible. IjnProcessEmitToolchain uses long jumps for those.
I assume that this is to keep the code generation simpler? if the overhead and actual workload code uses same instructions then it's perfectly fine!
First, IlBuilder emits nop instructions around jumps, c# compiler does not.
why do you need the nops?
You could reuse our existing ArgumentsTests and run them for both toolchains the way we do it for MemoryDiagnoserTests
(convering them from Facts to Theories running for current and inprocess toolchain)
Should we just replace the existing InProcessToolchain with InProcessEmitToolchain?
I plan to make InProcessEmitToolchain the default toolchain for in-process benchmarks
and keep old InProcessToolchain (only part with BenchmarkActionCodegen=DelegateCombine) for platforms that do not support Reflection.Emit (UAP or any AOT platforms).
If there's no support for no-emit platforms, the old InProcessToolchain can be dropped entirely.
I assume that this is to keep the code generation simpler?
Yep.
why do you need the nops?
I have no control over that.
The
ilBuilder.Emit(OpCodes.Ldarg_0);
ilBuilder.Emit(OpCodes.Ldfld, notElevenField);
ilBuilder.Emit(OpCodes.Ldc_I4_S, 11);
ilBuilder.Emit(OpCodes.Bne_Un, notElevenLabel);
emits something like
IL_0000: ldarg.0
IL_0001: ldfld int32 BenchmarkDotNet.Autogenerated.Runnable_1::NotEleven
IL_0006: ldc.i4.s 11
IL_0008: nop // <---
IL_0009: nop
IL_000a: nop
IL_000b: bne.un IL_0017
You could reuse our existing ArgumentsTests and run them for both toolchains the way we do it for MemoryDiagnoserTests
Wow, will do:)
I've just found subltle bug with MemoryDiagnoser (looks like I'm doing unintended allocations somewhere), besides this the PR is ready for rewiev. Will fix errors and push it tomorrow
Okay, I've just found the reason of memory leaks with in-process toolchain, It was the xUnit output listener.
This one works fine.
private IConfig CreateConfig(IToolchain toolchain)
=> ManualConfig.CreateEmpty()
.With(Job.ShortRun
.WithEvaluateOverhead(false) // no need to run idle for this test
.WithWarmupCount(0) // don't run warmup to save some time for our CI runs
.WithIterationCount(1) // single iteration is enough for us
.WithGcForce(false).With(toolchain))
.With(DefaultConfig.Instance.GetLoggers().ToArray())
.With(DefaultColumnProviders.Instance)
.With(MemoryDiagnoser.Default)
//.With(new OutputLogger(output)); // <-- !!!
.With(new ConsoleLogger());
P.S> AFK until evening, sorry.
These extra NOPs are not coming from jump opcodes, but from the size of the argument you pass to Emit(OpCode, ...). You gave it int(11), which fits 4 bytes, but values are encoded in little endian, so the lowest byte comes first. Luckily for you, opcode 00 is nop. Take care to cast your branches properly if those nops are really a concern to you - they shouldn't be anyways, as I'm pretty sure the JITter doesn't care (I don't know if the JITter for emitted code is the same than the regular one).
The Emit(....) overloads ~do not do any sanity checks for you~ (edit: not exactly true, but true enough in this scenario) - the only checks are done at baketime to make sure shorthand branches are within sbyte value range, AFAIK. As a rule of thumb, always cast your arguments explicitely. If you don't want to, refrain from using the shorthand opcodes, as they are really just a way to reduce the size of the IL code rather than an actual optimization.
@Warpten
UPD Dummit!
Just checked it and I'm totally wrong. Sorry, I'm not sure why I did so silly mistake, but I really managed to convince myself the nops are emitted by Emit(..., label) overload.
Sorry again, will recheck that all constant loads do use proper values.
Prev and wrong:
These extra NOPs are not coming from jump opcodes, but from the size of the argument
nope. Nops are used as placeholders for possible fixups. See comment at
https://referencesource.microsoft.com/#mscorlib/system/reflection/emit/ilgenerator.cs,791
for more details.
P.S. Fixups are performed by BakeByteArray()
Fixed by #921. Once again big thanks @ig-sinicyn ! You have done an amazing job!
Excellent. Thanks! When do we expect this version to be released?
Excellent. Thanks! When do we expect this version to be released?
@AndreyAkinshin is planning to release it this week
Most helpful comment
These extra NOPs are not coming from jump opcodes, but from the size of the argument you pass to Emit(OpCode, ...). You gave it
int(11), which fits 4 bytes, but values are encoded in little endian, so the lowest byte comes first. Luckily for you, opcode00is nop. Take care to cast your branches properly if those nops are really a concern to you - they shouldn't be anyways, as I'm pretty sure the JITter doesn't care (I don't know if the JITter for emitted code is the same than the regular one).The
Emit(....)overloads ~do not do any sanity checks for you~ (edit: not exactly true, but true enough in this scenario) - the only checks are done at baketime to make sure shorthand branches are withinsbytevalue range, AFAIK. As a rule of thumb, always cast your arguments explicitely. If you don't want to, refrain from using the shorthand opcodes, as they are really just a way to reduce the size of the IL code rather than an actual optimization.