Runtime: Consider re-compling CoreCLR's unmanaged code to mitigate JCC erratum (Nov. 2019)

Created on 14 Nov 2019  路  6Comments  路  Source: dotnet/runtime

Intel has recently released a microcode update that adversely affects the performance of... all jumps (!):

https://www.intel.com/content/dam/support/us/en/documents/processors/mitigations-jump-conditional-code-erratum.pdf

I've personally had the "pleasure" of encountering this whilst benchmarking CoreCLR 3.0 Array.Sort() on clearlinux which seems to have pushed this microcode update earlier than others:

https://twitter.com/damageboy/status/1194751035136450560?s=20

For context, Here's the before/after perf hit on Array.Sort():
Before:
image

After:
image

Very specifically, I was hit with a 20% drop in Array.Sort(), and a smaller (by sheer luck I presume?) hit with my own code. Intel claims that the average drop in performance should be in the %0-%4 range.

The erratum claims that the microcode update adversely effects:

In this context, Jump Instructions include all jump types: conditional jump (Jcc), macro-
fused op-Jcc (where op is one of cmp, test, add, sub, and, inc, or dec), direct
unconditional jump, indirect jump, direct/indirect call, and return.

The recommended fix, per the erratum, is to recompile the code with (Section 3.1.1):
-mbranches-within-32B-boundaries

In theory, this could end with a relatively minor(?) CoreCLR runtime release. Or incorporated into the 3.1 release cycle?

In addition, I think we should have a discussion about possibly applying similar (conditional) fixes to the JIT itself w.r.t the generated code on Intel processors, as it doesn't seem this issue is about to simply go away on its own.

I will open a separate issue for the JIT related discussion.

area-Meta tenet-performance untriaged

Most helpful comment

Thanks for opening the issue on this. We have been looking into this.

All 6 comments

Thanks for opening the issue on this. We have been looking into this.

Separately, since you used Array.Sort as an example, it's no longer implemented in C; as of last week the implementation is all C#. I'm curious if that affects your stated impact at all, or if you see a similar affect on Array.Sort with the latest master?

@stephentoub That's a very good question.
I haven't tested with master yet, and I am aware of the C++ -> C# transition for that specific piece of code.

My code itself (in this case the Scalar + Unmanaged variations of IntroSort) is definitely
affected (as seen in the same screenshot), with very wild outcomes:

  • Scalar, for example, suffers a 13% perf drop.
  • Unmanaged is hit with a very minor 1.4% drop.

Given that It was pretty much a copy-paste of IntroSort from CoreCLR to begin with, I think this gives you a "preview" of where this might go.

I assume that the effects are random in nature, depending on code generation/allocation address randomness.
But in general, the more branchy your code is, you'll probably get hit harder, which can also be seen in my two examples: these are practically the same functions, except that only one of them has extra bounds-checking that is not eliminated normally (nor did I attempt to optimize it away) and the other doesn't...

My other benchmarks actually end up doing substantially fewer branches (as in orders of
magnitudes fewer branches), so I suspect I won't see a lot of movement on that front...

Just wanted to circle back and to say that I managed to revert the microcode update in a very localized way on clearlinux (or I assume any Linux):

  1. Change the kernel boot parameters to avoid early loading of microcode updates (remove the initrd=/EFI/org.clearlinux/freestanding-00-intel-ucode.cpio from the boot cmd line)
  2. Revert to an older package of microcode from Intel to make sure that late loading of microcode does not sneak the update back in, by updating /lib/firmware/intel-ucode/ to the contents of the 2019-09-18 ucode release

After doing those two steps and (naturally) rebooting, I'm back to older microcode (0xb4) for my kaby-lake processor:

$ grep 'stepping\|model\|microcode' /proc/cpuinfo | head -4
model       : 158
model name  : Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
stepping    : 9
microcode   : 0xb4

Rerunning the same benchmarks now yields the good old results I've been seeing in the past months:

BenchmarkDotNet=v0.12.0, OS=clear-linux-os 31620
Intel Core i7-7700HQ CPU 2.80GHz (Kaby Lake), 1 CPU, 4 logical and 4 physical cores
.NET Core SDK=3.0.100
  [Host]     : .NET Core 3.0.0 (CoreCLR 4.700.19.46205, CoreFX 4.700.19.46214), X64 RyuJIT
  Job-BMTZZL : .NET Core 3.0.0 (CoreCLR 4.700.19.46205, CoreFX 4.700.19.46214), X64 RyuJIT

InvocationCount=3  IterationCount=15  LaunchCount=2  
UnrollFactor=1  WarmupCount=10  

| Method | N | Mean [us] | Time / N [ns] |
|---------- |--------- |---------------:|--------------:|
| ArraySort | 100 | 2.795 | 27.9545 |
| Scalar | 100 | 4.511 | 45.1068 |
| Unmanaged | 100 | 4.154 | 41.5399 |
| | | | |
| ArraySort | 1000 | 35.758 | 35.7583 |
| Scalar | 1000 | 58.767 | 58.7666 |
| Unmanaged | 1000 | 42.754 | 42.7538 |
| | | | |
| ArraySort | 10000 | 507.538 | 50.7538 |
| Scalar | 10000 | 680.009 | 68.0009 |
| Unmanaged | 10000 | 516.547 | 51.6547 |
| | | | |
| ArraySort | 100000 | 5,920.847 | 59.2085 |
| Scalar | 100000 | 6,678.108 | 66.7811 |
| Unmanaged | 100000 | 5,585.576 | 55.8558 |
| | | | |
| ArraySort | 1000000 | 69,018.648 | 69.0186 |
| Scalar | 1000000 | 76,603.616 | 76.6036 |
| Unmanaged | 1000000 | 65,503.492 | 65.5035 |
| | | | |
| ArraySort | 10000000 | 794,042.304 | 79.4042 |
| Scalar | 10000000 | 882,212.364 | 88.2212 |
| Unmanaged | 10000000 | 743,297.864 | 74.3298 |

@brianrob is work planned here? Can you provide guidance on what to do with this and https://github.com/dotnet/runtime/issues/13795

I hope to have all of the data in a format for sharing shortly. The short answer is that I don't think there is any immediate work that needs to be done here, but I will provide more context on this once I have all of the data compiled.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

omariom picture omariom  路  3Comments

noahfalk picture noahfalk  路  3Comments

aggieben picture aggieben  路  3Comments

nalywa picture nalywa  路  3Comments

GitAntoinee picture GitAntoinee  路  3Comments