Runtime: Performance anomalies in comparsion to Mono

Created on 16 Sep 2019 · 26Comments · Source: dotnet/runtime

Recently I made a bunch of various benchmarks to test Unity's Burst compiler against native compilers. I've also included Mono and CoreCLR out of curiosity, the code is available here in .NET folder. I've noticed strange results in two tests (Sieve of Eratosthenes and Particle Kinematics) where CoreCLR performs way much slower than Mono for some reason, I think this requires in-depth analysis by appropriate developers of .NET Core.

I'm happy to provide any additional info or assistance if required.

category:cq
theme:needs-triage
skill-level:expert
cost:medium

area-CodeGen-coreclr tenet-performance

Source

nxrighthere

👍2

Most helpful comment

@nxrighthere we are still moving things here and there but it's already possible for macOS and Linux:

./buid.sh -c Release /p:MonoEnableLLVM=true

then go to cd src/mono/netcore
and do

make run-sample

After that you should see .dotnet-mono folder in the repo root (make sure MONO_ENV_OPTIONS=--llvm is set as a env variable when you will use it to run benchmarks)

EgorBo on 22 Feb 2020

👍2

All 26 comments

@nxrighthere have you tested Mono-LLVM-JIT (.NET 5 runtime) by the way? 🙂 with recent changes it supports "fast-math" and all Math(F) methods are @llvm.intrinsics (let me know if you need any help to setup it)

EgorBo on 16 Sep 2019

🚀2

Looking at the tests, you may need to do 30+ iterations of the methods for .NET Core 3.0 prior to doing the measurement to allow tiered compilation to kick in.

benaadams on 16 Sep 2019

🚀2

@EgorBo Thanks for the suggestion, going to try to play with it for sure. 👍
@benaadams Yea, I was thinking about it too, going to install .NET Core 3.0 then. Thanks.

nxrighthere on 16 Sep 2019

👀1 🎉1

Dry run:

(CoreCLR 3.0 vs .NET 5 (Mono-LLVM-JIT runtime), our llvm backend for LLVM currently uses only just a few optimization passes)

Ubuntu 18
Core i7 4930K (Ivy Bridge)

EgorBo on 16 Sep 2019

👀1 🚀1

Also, it seems you use stackalloc a lot - it probably makes sense to clear InitLocals for the whole project (so it will not have to clear the memory everytime you allocate them)

EgorBo on 16 Sep 2019

@benaadams I've tried .NET Core 3.0.100-preview9 and engage tiered compilation through heavy iterations, but results are almost the same, unfortunately. 😢

@EgorBo Interesting, I've never heard before about InitLocals. I see many articles around reflection stuff, not sure how to use it properly tho.

nxrighthere on 16 Sep 2019

cc @BruceForstall @sergiy-k

jeffschwMSFT on 17 Sep 2019

@dotnet/jit-contrib

BruceForstall on 18 Sep 2019

@nxrighthere Have you tried disabling tiered compilation (set COMPlus_TieredCompilation=0) to force tier 1 compilation from the start?

Have you seen https://github.com/dotnet/performance? Maybe you should consider contributing the benchmarks to that set, to be run regularly on .NET?

BruceForstall on 18 Sep 2019

@BruceForstall Indeed, -set COMPlus_TieredCompilation=0 solved this, here's the diff with disabled Tiered Compilation.

Have you seen https://github.com/dotnet/performance? Maybe you should consider contributing the benchmarks to that set, to be run regularly on .NET?

I was not aware of this repository, will consider contributing directly into it, thank you.

Should I close this issue or keep it open?

nxrighthere on 18 Sep 2019

@nxrighthere Your linked repo mentions ".NET Core 2.2.402". Have you tried with the latest .NET Core 3.0 build to see if there is any difference?

We're always looking for good benchmarks to use for performance comparison. It looks like you've found some where there are perf gaps between RyuJIT and other options that could be investigated.

Should I close this issue or keep it open?

Seems reasonable to keep it open for now.

BruceForstall on 18 Sep 2019

@BruceForstall Here's the diff with results for 3.0.100-rc1. There's only one noticeable difference: recursive Fibonacci is slower by 22% with the new version, all other tests remain with near the same numbers.

nxrighthere on 18 Sep 2019

@EgorBo I'm a bit lost with Mono's LLVM. The 6.0.0.334 version on the website is able to compile the code with --aot=llvm,llvmllc="-mcpu=* -fp-contract=fast"? Also, what should be set to -mcpu parameter for AMD FX (Vishera)? Thanks.

nxrighthere on 18 Sep 2019

@nxrighthere --aot=llvm,mcpu=native --ffast-math But it will be slower than what I tested (mono-netcore-runtime, LLVM jit) you are going to benchmark "legacy" mono with LLVM AOT (which has some limitations).
It's a bit difficult to setup mono-netcore for now (netcore/./build.sh --llvm -c Release)

EgorBo on 18 Sep 2019

👍1

@EgorBo Hey Egor, it's possible to build the runtime with LLVM JIT from master on Windows right now?

nxrighthere on 22 Feb 2020

Looks like we never drilled in to understand why Core is slower -- seems like we ought to do so, there may be one or two things there we can address without needing entire new classes of optimization.

AndyAyersMS on 22 Feb 2020

Well, in general, it's all fine right now except places where floating-point arithmetic is involved, since as far as I know there's no equivalent to -ffast-math / /fp:fast in .NET Core.

nxrighthere on 22 Feb 2020

Is there some writeup you can point me at with more details?

AndyAyersMS on 22 Feb 2020

Sure:

nxrighthere on 22 Feb 2020

Thanks. I was actually looking for analysis showing that fast fp is the root cause of the perf differences in Core vs Mono-LLVM. I suspect there's more going on than just that...

AndyAyersMS on 22 Feb 2020

Cc @tannergooding

danmosemsft on 22 Feb 2020

@AndyAyersMS I think one of the low hanging fruits is a*b+c to fma recognition.

EgorBo on 22 Feb 2020

@nxrighthere we are still moving things here and there but it's already possible for macOS and Linux:

./buid.sh -c Release /p:MonoEnableLLVM=true

then go to cd src/mono/netcore
and do

make run-sample

After that you should see .dotnet-mono folder in the repo root (make sure MONO_ENV_OPTIONS=--llvm is set as a env variable when you will use it to run benchmarks)

EgorBo on 22 Feb 2020

👍2

After upgrading to .NET 5 Preview 8, I noticed a significant regression in this recursive Fibonacci test. Execution is slower by 40% vs .NET Core 3.1.101 while in other tests .NET 5 shows better results.

nxrighthere on 27 Aug 2020

If you're talking about a regression in CoreCLR perf, it's likely because of #35020.

AndyAyersMS on 27 Aug 2020

👍1

Was this page helpful?

0 / 5 - 0 ratings