In .NET 5, invoking hostfxr_get_runtime_delegate from a non-main thread appears to leave a background task running, consuming a full CPU thread.
Specifically, the issue can be reproduced in the following fashion on Windows:
int main(int argc, char* argv[])
{
// load_runtime sets up the runtime, calling `hostfxr_get_runtime_delegate` down the road
auto handle = CreateThread(nullptr, 0, &load_runtime, 0, 0, nullptr);
Sleep(INFINITE); // Stall the program to observe the effect.
}
This does not occur if the runtime is initialised on the main thread.
Minimal reproduction is available at Reloaded.Core.Bootstrap/runtimebug.
Simply execute Reloaded.Core.Bootstrap.Example and observe CPU utilisation once the program begins to sleep.
The offending function call that triggers the issue can be found at Reloaded.Core.Bootstrap/CoreCLR.cpp#L71.
(Requires Visual Studio 16.7 preview or newer, with Desktop development with C++ workload as well as the latest .NET 5 SDK).
To trigger the correct, intended behaviour (no loop/bug) simply load the runtime on main thread. i.e. replace
initializeThreadHandle = CreateThread(nullptr, 0, &load_bootstrapper_async, 0, 0, nullptr);
WaitForSingleObject(initializeThreadHandle, INFINITE);
with
load_bootstrapper_async(0);
in the example's entry point.
I have set it up this way to make control flow comparisons easier between intended and unintended behaviour should that be helpful.
Runtime: 5.0.0-preview.6
OS: Win 10 2004 (Build 19041.329)
Architecture: x86, x64
This is a regression from .NET Core 3.X.
(i.e. Code runs as expected all the way from 3.0 Preview 6 when it became first available to the latest 3.1 release).
I have not yet identified the first .NET 5 Preview/build with the issue intact.
.NET 5:

netcoreapp3.1:

netcoreapp3.0:

Tagging subscribers to this area: @vitek-karas, @swaroop-sridhar, @agocke
Notify danmosemsft if you want to be subscribed.
Thanks a lot for the repro @Sewer56 . I debugged into it a little bit and it seems to be related to the diagnostic server thread. It loops in this callstack:
> coreclr.dll!IpcStreamFactory::GetNextAvailableStream(void(*)(const char *, unsigned int) callback) Line 166 C++
coreclr.dll!DiagnosticServer::DiagnosticsServerThread(void * __formal) Line 50 C++
kernel32.dll!BaseThreadInitThunk() Unknown
ntdll.dll!RtlUserThreadStart() Unknown
It calls the IpcStream::DiagnosticsIpc::Poll which "successfully" waits (seems to be no wait time actually) on the handle, then gets a stream which is always NULL - that causes the caller to loop and try again... and so on.
I don't know what's different running it from managed exe versus hosted via the hosting APIs - obviously a simple console app doesn't repro this.
Tagging subscribers to this area: @tommcdon
Notify danmosemsft if you want to be subscribed.
I don't know what's different running it from managed exe versus hosted via the hosting APIs - obviously a simple console app doesn't repro this.
Indeed, would be interesting to know, even if at a high level.
I'd be interested digging into the runtime internals myself someday.
Thanks for reaching out so quickly.
On an unrelated note.
Seems there was something I forgot to mention in the repro; this should be fairly obvious and implicit from the issue description but I should mention just in case.
To trigger the correct, intended behaviour (no loop) simply load the runtime on main thread. i.e. replace
initializeThreadHandle = CreateThread(nullptr, 0, &load_bootstrapper_async, 0, 0, nullptr);
WaitForSingleObject(initializeThreadHandle, INFINITE);
with
load_bootstrapper_async(0);
in the example's entry point. I edited the opening post to reflect this.
Explicitly, I set it up this way in the repro so it's easier to test & compare control flow.
@dotnet/dotnet-diag @josalem
Interesting! I haven't had a chance to try the repro since I'm on my Mac right now.
@vitek-karas when you run this under the debugger, does a NamedPipe with a name like dotnet-diagnostic-<pid> show up for the application? IpcStreamFactory::GetNextAvailableStream returning nullptr indicates a failure state of the underlying NamedPipe (on Windows). Could you give some more details about what's happening inside the call to GetNextAvailableStream and Poll? Alternatively, could you turn on StressLog with the diagnostics log facility (0x00001000) and post the result? I'll try to repro this tomorrow.
I put this under a debugger this afternoon. The server is being put into an error state where the overlapped IO is continually returning ERROR_OPERATION_ABORTED. The description for that error states that
The I/O operation has been aborted because of either a thread exit or an application request.
In other words, the fact that you loaded the runtime on one thread but ended that thread broke the asynchronous IO. The abort is being recognized here at line 327:
which unfortunately doesn't set fSawError if the GetConnectedStream callback returns nullptr. In that case we never reset the asynchronous call to ConnectNamedPipe, so the pipe always comes back marked as signaled, and then fails in the same way.
This should be fixable by special casing the abort case and/or resetting the connection if the callback returns nullptr.
Most helpful comment
I put this under a debugger this afternoon. The server is being put into an error state where the overlapped IO is continually returning
ERROR_OPERATION_ABORTED. The description for that error states thatIn other words, the fact that you loaded the runtime on one thread but ended that thread broke the asynchronous IO. The abort is being recognized here at line 327:
https://github.com/dotnet/runtime/blob/50f82db26eaa743074fea538063abb4fec0adb02/src/coreclr/src/vm/ipcstreamfactory.cpp#L324-L330
which unfortunately doesn't set
fSawErrorif theGetConnectedStreamcallback returnsnullptr. In that case we never reset the asynchronous call toConnectNamedPipe, so the pipe always comes back marked as signaled, and then fails in the same way.This should be fixable by special casing the abort case and/or resetting the connection if the callback returns
nullptr.