Today, I have built an updated Yocto image which contains four SDK based projects. Each publishing step for these four projects hung during the build.
Yocto builds the projects in parallel and each runs through building the client-side packages with node and then runs the equivalent of dotnet publish -c Release -o … -f net471 for each project. The projects all have their private source code tree (it's a copy from the actual GIT repository), so they don't share any code files, bin or obj folders, but there is a common NuGet cache. From the build logs, I can see that the publish command went as far as printing the startup banner
Microsoft (R) Build Engine version 15.7.179.6572 for .NET Core
Copyright (C) Microsoft Corporation. All rights reserved.
and didn't do anything beyond that. The dotnet processes existed (by looking at pstree), but weren't doing anything. There were two kinds of dotnet processes: some attached directly to the init process and those being spawned by the Yocto bitbake infrastructure from the shell.
After killing the entire build process, I restarted it and the publish commands hung again.
Manually running one of the publish commands on the command line instead of the build scripts hung as well.
I then killall dotnet'ed all existing dotnet processes and everything went back to normal. I haven't seen a hang since.
I don't know.
dotnet --info output:
.NET Core SDK (reflecting any global.json):
Version: 2.1.301
Commit: 59524873d6
Runtime Environment:
OS Name: ubuntu
OS Version: 16.04
OS Platform: Linux
RID: ubuntu.16.04-x64
Base Path: /usr/share/dotnet/sdk/2.1.301/
Host (useful for support):
Version: 2.1.1
Commit: 6985b9f684
.NET Core SDKs installed:
1.0.1 [/usr/share/dotnet/sdk]
2.0.0 [/usr/share/dotnet/sdk]
2.1.3 [/usr/share/dotnet/sdk]
2.1.301 [/usr/share/dotnet/sdk]
.NET Core runtimes installed:
Microsoft.AspNetCore.All 2.1.1 [/usr/share/dotnet/shared/Microsoft.AspNetCore.All]
Microsoft.AspNetCore.App 2.1.1 [/usr/share/dotnet/shared/Microsoft.AspNetCore.App]
Microsoft.NETCore.App 1.0.4 [/usr/share/dotnet/shared/Microsoft.NETCore.App]
Microsoft.NETCore.App 1.1.1 [/usr/share/dotnet/shared/Microsoft.NETCore.App]
Microsoft.NETCore.App 2.0.0 [/usr/share/dotnet/shared/Microsoft.NETCore.App]
Microsoft.NETCore.App 2.0.4 [/usr/share/dotnet/shared/Microsoft.NETCore.App]
Microsoft.NETCore.App 2.1.1 [/usr/share/dotnet/shared/Microsoft.NETCore.App]
To install additional .NET Core runtimes or SDKs:
https://aka.ms/dotnet-download
cc @rainersigwald
@Tragetaschen To help try to narrow down one possible contributing factor, does this repro if you export MSBUILDDISABLENODEREUSE=1 after all prior dotnet instances have been terminated?
I'll run a stress test next week to see if it reproduces at all and then try the export.
It would also be great if you could enable and capture logging, which might give more clues as to where the hang is happening. Normally I'd suggest using a binary log by passing -bl to MSBuild, but since this is a hang a text log might be safer in the face of unexpected process termination. Something like -filelog -fileloggerparameters:verbosity=diagnostic;LogFile=Publish.log would be helpful.
I tried reproducing it as-is with no success. I looped a hundred times each
killall dotnet -> buildand all finished successfully.
Either I'm overlooking something or there's a very narrow race :-/
I've not been able to reproduce this with another 100+100 runs.
I blame cosmic rays.