Restore during build using dotnet msbuild is crashing with an unhandled exception on Unix operating systems. The problem appears to be a simple unhandled exception.
/__w/1/s/.dotnet/sdk/5.0.100-preview.1.20112.7/MSBuild.dll /nologo -maxcpucount /m -verbosity:m /v:minimal /bl:/__w/1/s/artifacts/log/Release/ToolsetRestore.binlog /clp:Summary /clp:ErrorsOnly;NoSummary /nr:false /p:TreatWarningsAsErrors=true /p:ContinuousIntegrationBuild=true /p:__ToolsetLocationOutputFile=/__w/1/s/artifacts/toolset/5.0.0-beta.20117.3.txt /t:__WriteToolsetLocation /warnaserror /__w/1/s/artifacts/toolset/restore.proj
Clearing NuGet HTTP cache: /home/helixbot_azpcontainer/.local/share/NuGet/v3-cache
Local resources cleared.
/__w/1/s/.dotnet/sdk/5.0.100-preview.1.20112.7/MSBuild.dll /nologo -logger:/__w/1/s/.packages/microsoft.dotnet.arcade.sdk/5.0.0-beta.20117.3/tools/netcoreapp2.1/Microsoft.DotNet.Arcade.Sdk.dll -maxcpucount /m -verbosity:m /v:minimal /bl:/__w/1/s/artifacts/log/Release/Build.binlog /clp:Summary /nr:false /p:TreatWarningsAsErrors=true /p:ContinuousIntegrationBuild=true /p:Configuration=Release /p:RepoRoot=/__w/1/s /p:Restore=true /p:Build=true /p:Rebuild=false /p:Test=false /p:Pack=false /p:IntegrationTest=false /p:PerformanceTest=false /p:Sign=false /p:Publish=false /p:SubsetCategory=mono /p:ConfigurationGroup=Release /p:ArchGroup=x64 /p:TargetArchitecture=x64 /p:CMakeArgs= /p:MonoEnableLLVM=false /warnaserror /__w/1/s/.packages/microsoft.dotnet.arcade.sdk/5.0.0-beta.20117.3/tools/Build.proj
Restore completed in 16.48 sec for /__w/1/s/tools-local/tasks/installer.tasks/installer.tasks.csproj.
installer.tasks -> /__w/1/s/artifacts/bin/installer.tasks/Debug/netstandard2.0/installer.tasks.dll
Tool 'coverlet.console' (version '1.7.0') was restored. Available commands: coverlet
Tool 'dotnet-reportgenerator-globaltool' (version '4.4.2') was restored. Available commands: reportgenerator
Tools restore was successful.
Restore completed in 5.11 sec for /__w/1/s/.packages/microsoft.dotnet.arcade.sdk/5.0.0-beta.20117.3/tools/Tools.proj.
Restore completed in 11.44 ms for /__w/1/s/src/mono/netcore/System.Private.CoreLib/System.Private.CoreLib.csproj.
/__w/1/s/src/mono/netcore/nuget/packages.builds : error : System.AggregateException: One or more errors occurred. (Could not find a part of the path '/tmp/NuGetScratch/fddf3c7a91d54497a7b711c945fe60d3/b8178b951e2f4151a1758acd7ba7326f.proj.nuget.dgspec.json'.)
/__w/1/s/src/mono/netcore/nuget/packages.builds : error : ---> System.IO.DirectoryNotFoundException: Could not find a part of the path '/tmp/NuGetScratch/fddf3c7a91d54497a7b711c945fe60d3/b8178b951e2f4151a1758acd7ba7326f.proj.nuget.dgspec.json'.
/__w/1/s/src/mono/netcore/nuget/packages.builds : error : at Interop.ThrowExceptionForIoErrno(ErrorInfo errorInfo, String path, Boolean isDirectory, Func`2 errorRewriter)
/__w/1/s/src/mono/netcore/nuget/packages.builds : error : at Microsoft.Win32.SafeHandles.SafeFileHandle.Open(String path, OpenFlags flags, Int32 mode)
/__w/1/s/src/mono/netcore/nuget/packages.builds : error : at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, FileOptions options)
/__w/1/s/src/mono/netcore/nuget/packages.builds : error : at System.IO.FileStream..ctor(String path, FileMode mode)
/__w/1/s/src/mono/netcore/nuget/packages.builds : error : at NuGet.ProjectModel.DependencyGraphSpec.Save(String path)
/__w/1/s/src/mono/netcore/nuget/packages.builds : error : at NuGet.Commands.NoOpRestoreUtilities.PersistDGSpecFile(DependencyGraphSpec spec, String dgPath, ILogger log)
/__w/1/s/src/mono/netcore/nuget/packages.builds : error : at NuGet.Commands.RestoreCommand.EvaluateCacheFile()
/__w/1/s/src/mono/netcore/nuget/packages.builds : error : at NuGet.Commands.RestoreCommand.ExecuteAsync(CancellationToken token)
/__w/1/s/src/mono/netcore/nuget/packages.builds : error : at NuGet.Commands.RestoreRunner.ExecuteAsync(RestoreSummaryRequest summaryRequest, CancellationToken token)
/__w/1/s/src/mono/netcore/nuget/packages.builds : error : at NuGet.Commands.RestoreRunner.CompleteTaskAsync(List`1 restoreTasks)
/__w/1/s/src/mono/netcore/nuget/packages.builds : error : at NuGet.Commands.RestoreRunner.RunWithoutCommit(IEnumerable`1 restoreRequests, RestoreArgs restoreContext)
/__w/1/s/src/mono/netcore/nuget/packages.builds : error : --- End of inner exception stack trace ---
/__w/1/s/src/mono/netcore/nuget/packages.builds : error : at System.Threading.Tasks.Task`1.GetResultCore(Boolean waitCompletionNotification)
/__w/1/s/src/mono/netcore/nuget/packages.builds : error : at Microsoft.Build.NuGetSdkResolver.NuGetSdkResolver.NuGetAbstraction.GetSdkResult(SdkReference sdk, Object nuGetVersion, SdkResolverContext context, SdkResultFactory factory)
##[error]src/mono/netcore/nuget/packages.builds(0,0): error : System.AggregateException: One or more errors occurred. (Could not find a part of the pa
Builds where this has occurred
|Build|Kind|Timeline Record|
|---|---|---|
|534074|PR https://github.com/dotnet/runtime/pull/32736|Restore|
|533781|PR https://github.com/dotnet/runtime/pull/32716|Send tests to Helix|
|533729|PR https://github.com/dotnet/runtime/pull/32697|Build product|
|533526|PR https://github.com/dotnet/runtime/pull/32538|Build managed test components| |531845|PR https://github.com/dotnet/runtime/pull/32612|Build product|
|533303|PR https://github.com/dotnet/runtime/pull/32595|Build product|
|532918|Rolling|Send to Helix|
|532832|Rolling|Build product|
|531845|PR https://github.com/dotnet/runtime/pull/32612|Mono LLVM Product Build Linux x64 debug|
|531845|PR https://github.com/dotnet/runtime/pull/32612|Build product|
|531649|Rolling|Build native test components|
|531649|Rolling|CoreCLR Product Build OSX x64 checked|
|531548|PR https://github.com/dotnet/runtime/pull/32603|Build product|
|531548|PR https://github.com/dotnet/runtime/pull/32603|Mono LLVM Product Build OSX x64 release|
|531315|Rolling|Libraries Test Build OSX x64 Release|
|530973|PR https://github.com/dotnet/runtime/pull/32587|Libraries Build OSX x64 Debug|
|530946|PR https://github.com/dotnet/runtime/pull/32636|Mono LLVM Product Build Linux x64 debug|
|530946|PR https://github.com/dotnet/runtime/pull/32636|Build product|
|530196|PR https://github.com/dotnet/runtime/pull/32609|Build product|
|530196|PR https://github.com/dotnet/runtime/pull/32609|Mono Product Build Linux arm64 debug|
|529996|PR https://github.com/dotnet/runtime/pull/32149|CoreCLR Product Build OSX x64 checked|
|529996|PR https://github.com/dotnet/runtime/pull/32149|Build product|
|529965|PR https://github.com/dotnet/runtime/pull/32612|Mono Product Build Linux x64 release|
|529965|PR https://github.com/dotnet/runtime/pull/32612|Build product|
This is starting to become a bit of a blocking issue for the dotnet/runtime repository. Started showing up yesterday and now I'm seeing it across a number of builds.
The runs do have binlogs available from the restores but they don't really seem to have helpful information in them. Example
https://dev.azure.com/dnceng/9ee6d478-d288-47f7-aacc-f6e6d082ae6d/_apis/build/builds/529965/artifacts?artifactName=BuildLogs_Mono_Linux_x64_release&fileId=50260E1C6BB8177752CB6690A24F7D636CC8A076F6FB6CC6FAB6F5DB3AED4BC502&fileName=Build.binlog&api-version=5.0-preview.3
Updated to include more of the builds / jobs where this failed.
@rrelyea @dtivel @nkolev92 - Can you please chime in here? This is affecting a bunch of repos already and has the potential to become even more critical.
@nkolev92, this looks similar to https://github.com/NuGet/Home/issues/7341. Any ideas?
Are all the failures coming from the SDK resolver?
If so, I have some ideas that I can look into.
Might be related to https://github.com/NuGet/Home/issues/8793
Problem probably here: https://github.com/NuGet/NuGet.Client/blob/dev/src/NuGet.Core/Microsoft.Build.NuGetSdkResolver/RestoreRunnerEx.cs#L48
@nkolev92 I think all builds listed by @jaredpar failed with the same error. Let me know if you need anything from our side to drill down this.
I clicked on a couple of random ones and all of them had resolver stack traces.
Not sure 100% what exactly the sequence is that leads to these failures, but I'm pretty confident I know the general area where the issue is.
I'll be handling priority issues next week (conveniently), so I'll take an initial stab at it.
fyi @jeffkl
That seems weird, the directory should exist. I suppose DependencyGraphSpec.Save() should create the directory just to be safe
There's a permissions trick.
The tmp folder in linux is global, so the first operation that creates /tmp/NuGetScratch sets the permissions for that folder.
In every scenario that NuGet writes to the temp folder we usually call: https://github.com/NuGet/NuGet.Client/blob/dev/src/NuGet.Core/NuGet.Common/PathUtil/DirectoryUtility.cs instead of Directory.CreateDirectory, because that creates the tmp folder with the correct permissions, allowing other machine users to write to the same folder, in turn allowing all users to do a restore themselves.
Now, we don't want the dgspec folder to have the same open permissions, we want to use the default one, because the dg spec folder is obj.
Since the resolver runs a preview restore, I think fixing #8793 should be enough, since it makes the temp project location irrelevant since we don't write anything to it.
Added more affected builds
Presently this is causing 6% of our builds to fail in dotnet/runtime.
Is there a step we can do before calling restore that would allow us to work around this problem until a real fix comes in? Waiting for NuGet means we'll likely stay at 6% failure for several weeks (at a minimum) given the state of code flow.
@markwilkie, @tmat
It's a multi step thing but should unblock you:
@nkolev92
Would a chmod also work here? That is significantly simpler for us to implement as a work around than an extra restore (just realities of how the build is setup).
Yeah, that'd work too
@nkolev92 that work around of creating the /tmp/NuGetScratch directory ahead of time didn't work. Getting build failures on the change where we added the work around.
https://dev.azure.com/dnceng/public/_build/results?buildId=540894&view=logs
This is worisome for several reasons:
I'm fairly confident it's the correct fix...the dependency hasn't flown into dotnet yet https://github.com/dotnet/sdk/pull/10726.
@nkolev92 we are still hitting this with 5.0.100-preview.2.20155.14:
/__w/2/s/.dotnet/sdk/5.0.100-preview.2.20155.14/MSBuild.dll /nologo -maxcpucount /m -verbosity:m /v:minimal /bl:/__w/2/s/artifacts/log/Debug/ToolsetRestore.binlog /clp:Summary /clp:ErrorsOnly;NoSummary /nr:false /p:TreatWarningsAsErrors=true /p:ContinuousIntegrationBuild=true /p:__ToolsetLocationOutputFile=/__w/2/s/artifacts/toolset/5.0.0-beta.20153.1.txt /t:__WriteToolsetLocation /warnaserror /__w/2/s/artifacts/toolset/restore.proj
/__w/2/s/artifacts/toolset/restore.proj : error : System.AggregateException: One or more errors occurred. (Could not find a part of the path '/tmp/NuGetScratch/02fd45654b454f9599090cac8c2dbf3c/3d2806c25a204fc38f7508f48b753d99.proj.nuget.dgspec.json'.)
/__w/2/s/artifacts/toolset/restore.proj : error : ---> System.IO.DirectoryNotFoundException: Could not find a part of the path '/tmp/NuGetScratch/02fd45654b454f9599090cac8c2dbf3c/3d2806c25a204fc38f7508f48b753d99.proj.nuget.dgspec.json'.
/__w/2/s/artifacts/toolset/restore.proj : error : at Interop.ThrowExceptionForIoErrno(ErrorInfo errorInfo, String path, Boolean isDirectory, Func`2 errorRewriter)
/__w/2/s/artifacts/toolset/restore.proj : error : at Microsoft.Win32.SafeHandles.SafeFileHandle.Open(String path, OpenFlags flags, Int32 mode)
/__w/2/s/artifacts/toolset/restore.proj : error : at System.IO.FileStream.OpenHandle(FileMode mode, FileShare share, FileOptions options)
/__w/2/s/artifacts/toolset/restore.proj : error : at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, FileOptions options)
/__w/2/s/artifacts/toolset/restore.proj : error : at System.IO.FileStream..ctor(String path, FileMode mode)
/__w/2/s/artifacts/toolset/restore.proj : error : at NuGet.ProjectModel.DependencyGraphSpec.Save(String path)
/__w/2/s/artifacts/toolset/restore.proj : error : at NuGet.Commands.NoOpRestoreUtilities.PersistDGSpecFile(DependencyGraphSpec spec, String dgPath, ILogger log)
/__w/2/s/artifacts/toolset/restore.proj : error : at NuGet.Commands.RestoreCommand.EvaluateCacheFile()
/__w/2/s/artifacts/toolset/restore.proj : error : at NuGet.Commands.RestoreCommand.ExecuteAsync(CancellationToken token)
/__w/2/s/artifacts/toolset/restore.proj : error : at NuGet.Commands.RestoreRunner.ExecuteAsync(RestoreSummaryRequest summaryRequest, CancellationToken token)
/__w/2/s/artifacts/toolset/restore.proj : error : at NuGet.Commands.RestoreRunner.CompleteTaskAsync(List`1 restoreTasks)
/__w/2/s/artifacts/toolset/restore.proj : error : at NuGet.Commands.RestoreRunner.RunWithoutCommit(IEnumerable`1 restoreRequests, RestoreArgs restoreContext)
/__w/2/s/artifacts/toolset/restore.proj : error : --- End of inner exception stack trace ---
/__w/2/s/artifacts/toolset/restore.proj : error : at System.Threading.Tasks.Task`1.GetResultCore(Boolean waitCompletionNotification)
/__w/2/s/artifacts/toolset/restore.proj : error : at System.Threading.Tasks.Task`1.get_Result()
/__w/2/s/artifacts/toolset/restore.proj : error : at Microsoft.Build.NuGetSdkResolver.NuGetSdkResolver.NuGetAbstraction.GetSdkResult(SdkReference sdk, Object nuGetVersion, SdkResolverContext context, SdkResultFactory factory)
/__w/2/s/artifacts/toolset/restore.proj : error MSB4236: The SDK 'Microsoft.DotNet.Arcade.Sdk' specified could not be found.
https://dev.azure.com/dnceng/public/_build/results?buildId=551633&view=logs
cc @jaredpar @hoyosjs
Sorry I was wrong, we aren't using the right bits. The commits points to an old build from early February: 7bac015acc5b7e0161a058c8febc98abe2386d51