Although some slowdown was fixed through https://github.com/dotnet/coreclr/issues/18705 , throwing exceptions while a debugger is attached is still almost an order of magnitude slower on macOS than it is on windows.
We use CancellationToken.ThrowIfCancellationRequested()
to throw the exception up through large dependency injection call stacks, which as far as I see is the recommended usage of cancellation tokens.
Here's my testing of this issue on various platforms/frameworks:
| Method | Mean | Error | StdDev |
|-------------------------------- |---------:|----------:|----------:|
| macOS // .NET Core 2.2 | 124.3 ms | 2.194 ms | 1.945 ms |
| macOS // .NET Core 3.0 | 123.8 ms | 1.477 ms | 1.233 ms |
| macOS // .NET 4.7.2 (mono) | 3.552 us | 0.0296 us | 0.0276 us |
| Windows // .NET Core 2.2 | 17.06 ms | 0.3291 ms | 0.8669 ms |
| Windows // .NET Core 3.0 | 17.08 ms | 0.3380 ms | 0.7629 ms |
| Windows // .NET 4.7.2 | 14.65 ms | 0.2151 ms | 0.1907 ms |
These/similar results can be reproduced via the following sample code:
for (; ; )
{
var sw = Stopwatch.StartNew();
try
{
throw new Exception();
}
catch
{
}
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds);
}
Callstack:
2725 RunMain(MethodDesc*, short, int*, PtrArray**) (in libcoreclr.dylib) + 724 [0x105c18294]
2725 MethodDescCallSite::CallTargetWorker(unsigned long const*, unsigned long*, int) (in libcoreclr.dylib) + 905 [0x105d52d39]
2725 CallDescrWorkerInternal (in libcoreclr.dylib) + 124 [0x105ef2d37]
2725 ??? (in <unknown binary>) [0x10d1d881f]
2725 IL_Throw(Object*) (in libcoreclr.dylib) + 700 [0x105dc44ac]
2723 DispatchManagedException(PAL_SEHException&, bool) (in libcoreclr.dylib) + 134 [0x105e5d256]
2722 UnwindManagedExceptionPass1(PAL_SEHException&, _CONTEXT*) (in libcoreclr.dylib) + 347 [0x105e5cfeb]
2722 ProcessCLRException (in libcoreclr.dylib) + 1736 [0x105e57fd8]
2722 ExceptionTracker::ProcessOSExceptionNotification(_EXCEPTION_RECORD*, _CONTEXT*, _DISPATCHER_CONTEXT*, unsigned int, StackFrame, Thread*, ExceptionTracker::StackTraceState) (in libcoreclr.dylib) + 1308 [0x105e58f0c]
1406 ExceptionTracker::ProcessManagedCallFrame(CrawlFrame*, StackFrame, StackFrame, _EXCEPTION_RECORD*, ExceptionTracker::StackTraceState, unsigned long, unsigned int, unsigned int, StackFrame) (in libcoreclr.dylib) + 5787 [0x105e5b52b]
| 1406 Debugger::FirstChanceManagedExceptionCatcherFound(Thread*, MethodDesc*, unsigned long, unsigned char*, EE_ILEXCEPTION_CLAUSE*) (in libcoreclr.dylib) + 247 [0x105bdba67]
| 1404 Debugger::SendCatchHandlerFound(Thread*, FramePointer, unsigned long, unsigned int) (in libcoreclr.dylib) + 846 [0x105bdbdde]
| 1404 Thread::RareDisablePreemptiveGC() (in libcoreclr.dylib) + 230 [0x105e356f6]
| 1404 Thread::RareEnablePreemptiveGC() (in libcoreclr.dylib) + 239 [0x105e35f6f]
| 1404 Thread::WaitSuspendEventsHelper() (in libcoreclr.dylib) + 208 [0x105e37400]
| 1404 CLREventBase::WaitEx(unsigned int, WaitMode, PendingSync*) (in libcoreclr.dylib) + 201 [0x105e31ed9]
| 1404 WaitForSingleObjectEx (in libcoreclr.dylib) + 77 [0x105b8437d]
| 1404 CorUnix::InternalWaitForMultipleObjectsEx(CorUnix::CPalThread*, unsigned int, void* const*, int, unsigned int, int, int) (in libcoreclr.dylib) + 1937 [0x105b84161]
| 1404 CorUnix::CPalSynchronizationManager::BlockThread(CorUnix::CPalThread*, unsigned int, bool, bool, CorUnix::ThreadWakeupReason*, unsigned int*) (in libcoreclr.dylib) + 464 [0x105b7f8a0]
| 1404 CorUnix::CPalSynchronizationManager::ThreadNativeWait(CorUnix::_ThreadNativeWaitData*, unsigned int, CorUnix::ThreadWakeupReason*, unsigned int*) (in libcoreclr.dylib) + 363 [0x105b7fc1b]
| 1404 _pthread_cond_wait (in libsystem_pthread.dylib) + 722 [0x7fff790d456e]
| 1404 __psynch_cvwait (in libsystem_kernel.dylib) + 10 [0x7fff7901b86a]
1316 ExceptionTracker::ProcessManagedCallFrame(CrawlFrame*, StackFrame, StackFrame, _EXCEPTION_RECORD*, ExceptionTracker::StackTraceState, unsigned long, unsigned int, unsigned int, StackFrame) (in libcoreclr.dylib) + 1660 [0x105e5a50c]
1316 Debugger::FirstChanceManagedException(Thread*, unsigned long, unsigned long) (in libcoreclr.dylib) + 36 [0x105bdb964]
1316 Debugger::SendException(Thread*, bool, unsigned long, unsigned long, bool, bool, bool, _EXCEPTION_POINTERS*) (in libcoreclr.dylib) + 535 [0x105bdb7f7]
1315 Debugger::SendExceptionEventsWorker(Thread*, bool, bool, bool, unsigned long, FramePointer, bool) (in libcoreclr.dylib) + 1295 [0x105bdb2af]
1315 Thread::RareDisablePreemptiveGC() (in libcoreclr.dylib) + 230 [0x105e356f6]
1315 Thread::RareEnablePreemptiveGC() (in libcoreclr.dylib) + 239 [0x105e35f6f]
1315 Thread::WaitSuspendEventsHelper() (in libcoreclr.dylib) + 208 [0x105e37400]
1315 CLREventBase::WaitEx(unsigned int, WaitMode, PendingSync*) (in libcoreclr.dylib) + 201 [0x105e31ed9]
1315 WaitForSingleObjectEx (in libcoreclr.dylib) + 77 [0x105b8437d]
1315 CorUnix::InternalWaitForMultipleObjectsEx(CorUnix::CPalThread*, unsigned int, void* const*, int, unsigned int, int, int) (in libcoreclr.dylib) + 1937 [0x105b84161]
1315 CorUnix::CPalSynchronizationManager::BlockThread(CorUnix::CPalThread*, unsigned int, bool, bool, CorUnix::ThreadWakeupReason*, unsigned int*) (in libcoreclr.dylib) + 464 [0x105b7f8a0]
1315 CorUnix::CPalSynchronizationManager::ThreadNativeWait(CorUnix::_ThreadNativeWaitData*, unsigned int, CorUnix::ThreadWakeupReason*, unsigned int*) (in libcoreclr.dylib) + 363 [0x105b7fc1b]
1315 _pthread_cond_wait (in libsystem_pthread.dylib) + 722 [0x7fff790d456e]
1315 __psynch_cvwait (in libsystem_kernel.dylib) + 10 [0x7fff7901b86a]
Info:
$ dotnet --list-sdks:
2.2.107 [/usr/local/share/dotnet/sdk]
3.0.100-preview5-011568 [/usr/local/share/dotnet/sdk]
$ mono --version:
Mono JIT compiler version 5.18.0.225 (2018-08/bac9fc1f889 Fri Dec 21 11:33:29 EST 2018)
Windows:
$ dotnet --list-sdks:
2.2.204 [C:\Program Files\dotnet\sdk]
3.0.100-preview5-011568 [C:\Program Files\dotnet\sdk]
A little bit more detail on this one, I've found that CPalSynchronizationManager::BlockThread
is always taking most of the time.
Among other things, I found this can be induced via Debugger.NotifyOfCrossThreadDependency()
.
Test:
private static Stopwatch stopwatch;
static void Main(string[] args)
{
new Thread(() =>
{
for (;;)
Debugger.NotifyOfCrossThreadDependency();
}).Start();
stopwatch = Stopwatch.StartNew();
for (;;)
outputToConsole();
}
private static long invocations;
private static void outputToConsole()
{
++invocations;
if (invocations % 100000 == 0)
{
Console.WriteLine(
$"{invocations} invocations, {(int)(stopwatch.Elapsed.Ticks / (float)invocations * 100)} ns/invoc");
}
}
Results:
| Platform (Runtime) | ns per invocation |
| ------------------- | ----------------- |
| macOS (.NET Core 2.1.0, Not attached) | 22ns |
| macOS (.NET Core 2.1.0, Attached) | 280ns |
| macOS (.NET Core 2.2.0, Not Attached) | 22ns |
| macOS (.NET Core 2.2.0, Attached) | 240ns |
| macOS (.NET Core 3.0.0-preview6-27804-01, Not Attached) | 22ns |
| macOS (.NET Core 3.0.0-preview6-27804-01, Attached) | 230ns |
| macOS (mono 5.20.1, Not Attached) | 12ns |
| macOS (mono 5.20.1, Attached) | 17ns |
| Windows (.NET Core 2.2.0, Not Attached) | 28ns |
| Windows (.NET Core 2.2.0, Attached) | 35ns |
| Windows (.NET 4.7.2, Not Attached) | 27ns |
| Windows (.NET 4.7.2, Attached) | 28ns |
All differences go away as soon as the separate thread throwing debugger notifications is removed from the equation.
As context for why this is important for us: we're working on a game and see visible frame drops seconds in length whenever any async state machine (e.g. web requests) is built or an exception (e.g. cancellation) is thrown. Debugging is quite painful for us because of this.
Any chance this could get some attention? It is heavily reducing efficiency in our workflow (we basically can never attach a debugger on macOS unless utmost necessary due to the performance overhead)
We will definitely take a look at this again. Probably for 3.1 or 5.0.
@mikem8361
This bug makes debugging impossible on medium-large projects with many background workers.
dotnet --version: 2.2.402
OS: MacOS High Sierra (10.13.6)
Tested with VS Code.
This is response times for test project (Hello World app) WITH debugger attached:
This is response times WITHOUT debugger attached:
This is just "hello world" application with ~20 background workers that does nothing except throw + catch exception and sleep for 1 sec.
On my project (angular app) single ajax request takes about 10 seconds (+- 5 seconds).
Test project:
src.zip
@tommcdon, we need to put this issue high on the priority list.
Hi all. Are there any chance it could be fixed in near previews? Thanks.
Just reporting in since I've recently changed dev environments, and I can now confirm the issue is reproducible on linux. Using my sample code from above
| Platform (Runtime) | ns per invocation |
| ------------------- | ----------------- |
| Linux (.NET Core 3.1.0, Not attached) | 14ns |
| Linux (.NET Core 3.1.0, Attached) | 204ns |
| Linux (mono 6.4.0, Not attached) | 6ns |
| Linux (mono 6.4.0, Attached) | 6ns |
To be clear: calling Debugger.NotifyOfCrossThreadDependency while a debugger is attached is going to be much much slower than when a debugger isn't attached on all platforms. I am actually surprising that it is as fast as your are seeing. #38736 tracks this.
This issue is about the fact that on macOS it is extra slow compared to Linux/Windows.
I'm finding it's equally as slow on Linux as macOS. It's only fast on Windows.
Both my benchmarks above and the original issue which attempted to resolve this (https://github.com/dotnet/runtime/issues/10605) will confirm this.
On Linux, did you attach a debugger to an already running process or launch?
That was attached at launch.
Sorry, I am not quite following what you mean. Is the target process a child process of the debugger?
Yes, it's a child process. However this is also replicable by attaching to an already running process.
What debugger are you using? Are you testing exceptions or NotifyOfCrossThreadDependency? Are you in a container? Or is this a VM?
On Linux, the runtime will try to use the Linux APIs to read memory from another process, which should give a significant performance advantage, but only for launch, and only if PTrace is enabled.
@gregg-miskelly the code I've used can be found above: https://github.com/dotnet/runtime/issues/12706#issuecomment-505745822 It uses NotifyOfCrossThreadDependency
. Not a container or VM.
As for debugger, I can reproduce on both VSCode (vsdbgui), and JetBrains Rider.
Edit: In our real application, we're encountering this when exceptions are being thrown. I'd previously benchmarked them to have the same increase in time so I assumed they were related, but it seems with the above example NotifyOfCrossThreadDependency
spends most of the time in the CLR, whereas changing it to throw and catch exceptions spends most of the time in kernel, reading and writing to the pipes.
Via NotifyOfCrossThreadDependency
:
Via throwing/catching exceptions:
So maybe these are separate issues, as you say...
Closing as runtime side of work is merged: #39804
This is intended to be included in 5.0 rc1, correct? I'm not seeing any improvement, so wanted to confirm before investigating further.
For this runtime side work to have any effect, Visual Studio鈥檚 engine binary need some special signing too. They have it scheduled but I not sure when it will be available.
/cc: @gregg-miskelly
@WardenGnaw is planning to look at it soon. We hope to have bits out before .NET 5 GA.
@mikem8361 on VS's side, is it entitlement and signing that's necessary? Or is notarization also required?
Only entitlement and signing. Notarization isn't required.
macOS signed with entitlements debugger bits are now available at aka.ms/getvsdbgsh.
It will be available on 16.8.11005.1 and higher.
Most helpful comment
@tommcdon, we need to put this issue high on the priority list.