Runtime: Support .NET 5 on Apple Silicon with Rosetta 2 emulation

Created on 18 Nov 2020  ·  21Comments  ·  Source: dotnet/runtime

Apple has announced plans to transition its Mac hardware line to a new Arm64-based chip that they refer to as “Apple Silicon”.

Initial .NET support will be through .NET running on the Rosetta 2 emulator. Longer term native support for Apple Silicon is planned for .NET 6.

While it is hoped that Rosetta 2 emulation will just work, the .NET runtime is complicated and real issues will make this a non-trivial task.

Current known issues

  • [ ] Apple Silicon uses a 16K memory page size. The .NET 5 stack probe code doesn't handle this yet. #45226. Per Apple this only affects the DTK and is fixed on M1 Silicon.
  • [ ] Rosetta 2 emulation crashes with a fatal failure when calling with thread_get_state x86_FLOAT_STATE64. This is because the emulator does not emulate AVX support, but the function should simply return an error.
  • [ ] Rosetta 2 emulation doesn't populate exceptionState.__trapno for other kernel entry than hardware exceptions (for example for syscalls). This means we fail to inject code necessary for garbage collection and sometimes deadlock.
  • [ ] With #45226 & https://github.com/janvorli/runtime/commit/aee81acd99b9c0e6a406bad3b98c278669c7cc67 19 runtime tests are failing under Rosetta 2 emulation which pass on macOS native x64
Team Epic arch-arm64 area-VM-coreclr os-mac-os-x-big-sur runtime-coreclr

Most helpful comment

@sdmaclea I'm the main developer of Rosetta 2, and I'm particularly interested in the last item that you mention:

Several hundred runtime tests are failing under Rosetta 2 emulation which pass on macOS native x64

What's the simplest set of commands I can run on a runtime checkout to see these failures? I'm not completely clear on which combination of Release/Debug components I should be using.

All 21 comments

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

/cc @richlander @janvorli @mangod9 @sandreenko

I've modified the details on the __trapno issue.

@richlander Is this issue also for .NET Core 3.1?

.NET Core 3.1 support would be highly appreciated as it is the current LTS version thus being the version AWS Lambda supports as a runtime until next LTS version (.net 6).

@sdmaclea I'm the main developer of Rosetta 2, and I'm particularly interested in the last item that you mention:

Several hundred runtime tests are failing under Rosetta 2 emulation which pass on macOS native x64

What's the simplest set of commands I can run on a runtime checkout to see these failures? I'm not completely clear on which combination of Release/Debug components I should be using.

@zwarich So the exact number of failing tests was actually 76. Rerunning the tests with #45226 fixed most of them.

The most of the remaining failed tests look like they are possibly related to division emulation. These are the currently failing tests:

# Divide modulo failures
./artifacts/tests/coreclr/OSX.x64.Release/JIT/IL_Conformance/Old/Conformance_Base/rem_i8/rem_i8.sh
./artifacts/tests/coreclr/OSX.x64.Release/JIT/IL_Conformance/Old/Conformance_Base/div_i8/div_i8.sh
./artifacts/tests/coreclr/OSX.x64.Release/JIT/Directed/coverage/oldtests/ovfldiv1_il_d/ovfldiv1_il_d.sh
./artifacts/tests/coreclr/OSX.x64.Release/JIT/Directed/coverage/oldtests/ovflrem1_il_d/ovflrem1_il_d.sh
./artifacts/tests/coreclr/OSX.x64.Release/JIT/Directed/coverage/oldtests/ovfldiv1_il_r/ovfldiv1_il_r.sh
./artifacts/tests/coreclr/OSX.x64.Release/JIT/Directed/coverage/oldtests/ovflrem1_il_r/ovflrem1_il_r.sh
./artifacts/tests/coreclr/OSX.x64.Release/JIT/CodeGenBringUpTests/DivConst_r/DivConst_r.sh
./artifacts/tests/coreclr/OSX.x64.Release/JIT/CodeGenBringUpTests/ModConst_r/ModConst_r.sh
./artifacts/tests/coreclr/OSX.x64.Release/JIT/CodeGenBringUpTests/DivConst_ro/DivConst_ro.sh
./artifacts/tests/coreclr/OSX.x64.Release/JIT/CodeGenBringUpTests/ModConst_do/ModConst_do.sh
./artifacts/tests/coreclr/OSX.x64.Release/JIT/CodeGenBringUpTests/DivConst_d/DivConst_d.sh
./artifacts/tests/coreclr/OSX.x64.Release/JIT/CodeGenBringUpTests/ModConst_ro/ModConst_ro.sh
./artifacts/tests/coreclr/OSX.x64.Release/JIT/CodeGenBringUpTests/DivConst_do/DivConst_do.sh
./artifacts/tests/coreclr/OSX.x64.Release/JIT/CodeGenBringUpTests/ModConst_d/ModConst_d.sh
./artifacts/tests/coreclr/OSX.x64.Release/JIT/jit64/regress/vsw/543645/test/test.sh

# Rosetta 2 CPU ID is not recognized (not surprising).
./artifacts/tests/coreclr/OSX.x64.Release/JIT/HardwareIntrinsics/X86/X86Base/CpuId_r/CpuId_r.sh
./artifacts/tests/coreclr/OSX.x64.Release/JIT/HardwareIntrinsics/X86/X86Base/CpuId_ro/CpuId_ro.sh

# assertion failed: GPR thread_set_state is unsupported while in sa_tramp
./artifacts/tests/coreclr/OSX.x64.Release/JIT/Regression/CLR-x86-JIT/V1.1-M1-Beta1/b143840/b143840/b143840.sh
./artifacts/tests/coreclr/OSX.x64.Release/baseservices/exceptions/regressions/V1/SEH/VJ/ExternalException/ExternalException.sh

The simplest set of commands to build and reproduce the failed tests.

# build on macos x64 (Intel)

# checkout https://github.com/dotnet/runtime/tree/release/5.0
git checkout origin/release/5.0

# Install build dependencies (once) using homebrew 
brew bundle --no-lock --file eng/Brewfile

# Apply patches as needed
# See https://github.com/dotnet/runtime/pull/45226
# Possibly see https://github.com/dotnet/runtime/issues/45222#issuecomment-734016047

#build the .NET core runtime
./build.sh clr+libs -x64 -c release

# build the tests 
src/coreclr/build-test.sh -release -priority1

# compress tests on build host
tar -zcf rosetta5_4k.tgz artifacts/tests/coreclr/OSX.x64.Release artifacts/bin/coreclr/OSX.x64.Release artifacts/tests/coreclr/OSX.x64.Release/Tests/Core_Root artifacts/obj/coreclr/OSX.x64.Release/tests

# uncompress on Apple Silicon
tar -xf rosetta5_4k.tgz
# The individual test can be run like 
export test=./artifacts/tests/coreclr/OSX.x64.Release/JIT/IL_Conformance/Old/Conformance_Base/rem_i8/rem_i8.sh
chmod u+x $test
$test -coreroot=$PWD/artifacts/tests/coreclr/OSX.x64.Release/Tests/Core_Root

@sdmaclea Thanks for the informative reply. I'll try to reproduce these failures on my own.

As far as #45226 is concerned, Rosetta 2 uses 4 KB pages on M1. The use of 16 KB pages was a limitation of the DTK. That said, you probably need the same fix for a native port.

# Rosetta 2 CPU ID is not recognized (not surprising).
./artifacts/tests/coreclr/OSX.x64.Release/JIT/HardwareIntrinsics/X86/X86Base/CpuId_r/CpuId_r.sh
./artifacts/tests/coreclr/OSX.x64.Release/JIT/HardwareIntrinsics/X86/X86Base/CpuId_ro/CpuId_ro.sh

These actually pass on an M1 machine (rather than a DTK).

# assertion failed: GPR thread_set_state is unsupported while in sa_tramp
./artifacts/tests/coreclr/OSX.x64.Release/JIT/Regression/CLR-x86-JIT/V1.1-M1-Beta1/b143840/b143840/b143840.sh
./artifacts/tests/coreclr/OSX.x64.Release/baseservices/exceptions/regressions/V1/SEH/VJ/ExternalException/ExternalException.sh

I was running without the activations-via-signals patch, where you see a different issue, which can be worked around by using X86_FLOAT_STATE64 rather than X86_FLOAT_STATE.

This assertion indicates that one thread was trying to use thread_set_state on a thread that was logically in the first x86 instruction in userspace after signal delivery. This is not supported (at the moment) by Rosetta. However, it is also difficult to write a correct program that does this. In this case, the target thread was probably delivering INJECT_ACTIVATION_SIGNAL. This .NET runtime specifies that this signal be masked in its own handler, so the signal would be masked at this point. If you call thread_set_state on this thread, it will execute code elsewhere with the signal indefinitely masked, preventing further delivery of INJECT_ACTIVATION_SIGNAL. Of course, you could try to reset the signal mask in every function where you redirect control flow. In general, it's probably best to commit to using either Mach exceptions or BSD signals and not mix them too much.

There might be other issues that I am not aware of, but I have investigated all of the ones mentioned here and I believe they can all be addressed in Rosetta. There should be no workarounds required in .NET (at least on M1, as opposed to the DTK).

@sdmaclea What commands/scripts should I use to test further whether .NET is working correctly, e.g. longer tests and stress tests?

@zwarich thank you so much for all the insight! Regarding the INJECT_ACTIVATION_SIGNAL vs thread_set_state issue, with the change to use signal, we use the thread_set_state to redirect a thread only in case of a hardware exception (access violation, division by zero, ...) on that thread. From what you've said, it seems that such a mechanism cannot work on Rosetta in presence of any signals handling. So even if the didn't use a signal for activation injection, other process can still send a signal to the process running .NET app and result in the same assert in case it happened at the same time as a hardware exception. Such signals can be e.g. SIGCHLD, SIGINT, SIGTERM etc.
Just to make sure I fully understand what's going on, let me summarize it:

  • A signal is sent to a thread.
  • A hardware exception occurs on that thread. It can be e.g. dereferencing a null reference in managed code
  • OS suspends the thread and sends a message to an exception port
  • OS attempts to deliver the signal to the target thread since that thread is suspended, it doesn't change its context to point to the signal handler (thread_get_state still returns context of our code), but internally, it has some flag or something like that set that indicates that once that thread returns to user mode, it would start to execute the signal handler.
  • Our exception handling thread receives the hardware exception related message from the exception port. It tries to change its context to our exception handler function using the thread_set_state and then resume the thread. But the thread_set_state asserts.

Is there a way to workaround such an issue so that we can still redirect the failing thread to our code that can handle the exception? It seems that if thread_set_state returned an error code instead of asserting, we could possibly just resume the thread without changing the context. The signal handler would execute then and after it returns, the hardware exception triggering instruction would be re-executed, the whole process above will repeat, but this time the thread_set_state would succeed.
Or is there a way to somehow detect that the thread is going to handle a signal and skip the thread_set_state? I was originally thinking that the get_thread_state would return the context of the signal handler, but we have verified it is not the case and it still returns the context of the hardware exception point.

@zwarich Thanks.

We have lots of different stress test modes. We will try to light them up in CI when we get sufficient M1 hardware.

I am not sure which of them to point you at. There are a lot of tests. Many of them are focused on functional correctness and stressing the JIT or the GC. I am not sure how the coverage would be in terms of Rosetta emulation coverage.

  • @Maoni0 Do you recommend the full GC stress tests? The long running GC tests? Where would the best instructions currently reside?
  • @BruceForstall Do you recommend a particular set of JIT stress tests?
  • @danmosemsft Are there good instructions for running the Libraries tests manually?

One of the most difficult tests is actually getting the SDK stable enough to build a large project. So building the runtime on M1 might be a good smoke test. This was failing with deadlock 90% of the time. (Presumably due to the X86_FLOAT_STATE64 issue)

So you could build the native runtime on M1. Almost the same instruction I gave you above.

# build .NET 6.0 master on macos arm64 on (M1)
# checkout https://github.com/dotnet/runtime
git checkout origin/master

# Install build dependencies (once) using homebrew 
# Last I checked I had to hack this a bit to get it to work on DTK
# Last time I cheated and did 
#`arch -arch x86_64 brew bundle --no-lock --file eng/Brewfile` 
# for at least a subset of the dependencies
brew bundle --no-lock --file eng/Brewfile

# Apply patches as needed
# See https://github.com/dotnet/runtime/pull/45226
# Possibly see https://github.com/dotnet/runtime/issues/45222#issuecomment-734016047

#build the .NET core runtime
./build.sh clr+libs -arm64 -c release

# build the tests 
src/coreclr/build-test.sh -release -arm64 -priority1

I'd be surprised if JIT stress modes would exhibit uniquely challenging behavior on Rosetta. However, to run the most common JIT stress, set environment variable COMPlus_JitStress to 1 or 2 before running the tests. This requires you to be using a Checked (or Debug) build configuration (not Release), so, e.g., ./build.sh clr+libs -arm64 -c checked.

@sandreenko Can also advise.

GC Stress would likely be more "stressful" to the system. The two most common settings are setting environment variable COMPlus_GCStress to 3 or C. This works on a Release build as well as Checked.

btw, running the tests locally can be done with python ./src/tests/run.py and passing some number of arguments, depending on the setup (use -h to see the options).

for GC you'd want to run the GC functional tests + stress which is documented here (it looks like when the test src was moved, this file was not moved) -

https://github.com/dotnet/coreclr/blob/release/3.0/tests/src/GC/Stress/stress_run_readme.txt

this should be in the same place in the new test src location.

my bad...I forgot that the file was moved but to a different location. however the file, which is now at docs/workflow/testing/coreclr/gc-stress-run-readme.md, needs to be updated, and @cshung is working on updating it: https://github.com/dotnet/runtime/pull/45392.

The two most common settings are setting environment variable COMPlus_GCStress to 3 or C

Per @janvorli gcstress mode C is not supported on macos yet. #42025

@janvorli Thanks for bringing this up. I took a look at what was happening with your signals-via-activations branch (I needed to force HAVE_UCONTEXT_T to get it to compile, is this expected?). It turns out that it was a different issue than I was expecting.

In this case, a signal is being delivered to a thread that is waiting in the kernel for the reply from the exception handler. On an ordinary native binary (x86 or ARM), this should delay signal delivery until the reply from the exception handler, which does not happen when the thread is merely suspended (as I believe you are seeing in #44498). This is not happening in all cases under Rosetta, for reasons intertwined with the assertion that you see. However, it should be possible to fix all of these issues, so that both the __trapno approach and the signals approach work under Rosetta as well as they do natively.

Hi folks, when trying to debug a Hello, World console app with .NET 5 I get the following StackOverflowException, which is around the thread_set_state method.

Just adding this exception so you have more information.

I've tried debugging across most major IDEs: JetBrains Rider, Visual Studio for Mac, and VS Code. All end with the same exception, so I can safely rule out any debugger specific implementations.

Stack overflow.
   at System.Collections.HashHelpers.GetPrime(Int32)
   at System.Collections.Generic.Dictionary`2[[System.__Canon, System.Private.CoreLib, Version=5.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.__Canon, System.Private.CoreLib, Version=5.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].Initialize(Int32)
   at System.Collections.Generic.Dictionary`2[[System.__Canon, System.Private.CoreLib, Version=5.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.__Canon, System.Private.CoreLib, Version=5.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]]..ctor(Int32, System.Collections.Generic.IEqualityComparer`1<System.__Canon>)
   at System.AppContext.Setup(Char**, Char**, Int32)
RestoreState: 1124: thread_set_state(float) (os/kern) invalid argument

@ViktorHofer can help answer the question above about running libraries tests.

My Core/Angular project on M1 isnt running any longer.

Initially, I was only able to get it to run by running with a custom configuration (however, I didnt change any of the configs).

After a VS update, it will no longer run at all. It compiles fine, but I cannot run or debug the Core code. The angular project will bootstrap itself and run, but thats it. Angular project will run on its own, or by trying to get the core project to run.

No errors to help track down the issue.

If its any consolation, .Net Core isnt quitting unexpectedly constantly anymore.

@zwarich do you think these changes to r2 will be part of 11.1?

@ilushka85 I can't comment on upcoming macOS releases or precise timelines for fixes, but I will try to post here when a beta or release containing them becomes available.

Was this page helpful?
0 / 5 - 0 ratings