Home: RestoreTask is not "thread safe" - NuGet restore fails with "Cannot create a file when that file already exists" when restore invoked in parallel

Created on 18 Dec 2018  路  21Comments  路  Source: NuGet/Home

Invoking the "Restore" target on projects in parallel leads to a race condition in committing the restore result to disk. This appears most often as the result of developers writing their own MSBuild targets and using the <MSBuild> task to invoke Restore.

The only workaround I could find was to avoid this calling pattern completely. Instead, you can call the the 'restore' target in NuGet.targets and pass in the list of projects.

cref https://github.com/dotnet/arcade/pull/1567
cref https://github.com/aspnet/BuildTools/blob/0872134136bf3d181e3ca3fa700b7ff392c055b3/files/KoreBuild/modules/projectbuild/module.targets#L21-L70

Ideas

I don't see an obvious solution. Here are some ideas

  • Use IBuildEngine4.GetRegisteredTaskObject within RestoreTask to avoid double-restores during the same build.
  • Use IBuildEngine4.GetRegisteredTaskObject within RestoreTask to detect a double-restore was invoked, and provide a warning message.
  • Include _which_ file is the problem in the error message.
  • Instead of failing immediately when you can't write the restore result, retry.

Details about Problem

NuGet product used: dotnet.exe or MSBuild.exe
dotnet.exe --version: 2.2.100
VS version: 15.9
OS version: Any
Worked before?: no

Detailed repro steps so we can see the same problem

  1. Create a set of 3 simple .NET standard class library projects (see attached repro). Two projects should both have ProjectReference to the third project.
Lib A --> LibC
Lib B --> LibC
  1. Create an MSBuild targets file like this. We'll call it "ci.proj".
<!-- ci.proj -->
<Project>
    <ItemGroup>
        <Projects Include="LibA/LibA.csproj" />
        <Projects Include="LibB/LibB.csproj" />
        <Projects Include="LibC/LibC.csproj" />
    </ItemGroup>

    <Target Name="Restore">
        <MSbuild Projects="@(Projects)"
                 Targets="Restore"
                 BuildInParallel="true" />
    </Target>
</Project>
  1. Run restore by invoking msbuild -t:ci.proj -t:restore

Result

NuGet fails in LibC with an error that says "Cannot create a file when that file already exists". The error message does not indicate which file was the problem.

 Restoring packages for C:\tmp\restoreRepro\LibC\LibC.csproj...
  Restoring packages for C:\tmp\restoreRepro\LibC\LibC.csproj...
  Restoring packages for C:\tmp\restoreRepro\LibB\LibB.csproj...
  Generating MSBuild file C:\tmp\restoreRepro\LibC\obj\LibC.csproj.nuget.g.props.
  Generating MSBuild file C:\tmp\restoreRepro\LibC\obj\LibC.csproj.nuget.g.targets.
  Generating MSBuild file C:\tmp\restoreRepro\LibC\obj\LibC.csproj.nuget.g.targets.
  Generating MSBuild file C:\tmp\restoreRepro\LibB\obj\LibB.csproj.nuget.g.props.
  Generating MSBuild file C:\tmp\restoreRepro\LibB\obj\LibB.csproj.nuget.g.targets.
  Restore completed in 12.8 ms for C:\tmp\restoreRepro\LibC\LibC.csproj.
  Restore completed in 14.61 ms for C:\tmp\restoreRepro\LibB\LibB.csproj.
  Restoring packages for C:\tmp\restoreRepro\LibA\LibA.csproj...
  Restoring packages for C:\tmp\restoreRepro\LibC\LibC.csproj...
C:\Users\namc\.dotnet\x64\sdk\2.2.100\NuGet.targets(114,5): error : Cannot create a file when that file already exists [C:\tmp\restoreRepro\LibB\LibB.csproj]
  Generating MSBuild file C:\tmp\restoreRepro\LibA\obj\LibA.csproj.nuget.g.props.
  Generating MSBuild file C:\tmp\restoreRepro\LibA\obj\LibA.csproj.nuget.g.targets.
  Restore completed in 128.42 ms for C:\tmp\restoreRepro\LibC\LibC.csproj.
  Restore completed in 147.55 ms for C:\tmp\restoreRepro\LibA\LibA.csproj.

Other suggested things

This is a race condition, so it may not appear right away. But it is consistent. I get consistent repros like this:

for($i = 0; $i -lt 100; $i+=1) {
  gci -re */obj/**/* | % {rm $_ } # clean the last restore result
  dotnet msbuild build.proj -bl -t:Restore
  if ($LASTEXITCODE -ne 0){ break }
}

Verbose Logs

msbuild.binlog.zip

Sample Project

repro-parallel-restore.zip

Restore Icebox 2 Bug

Most helpful comment

It kind of looks like multiple restores are happening at the same time for the same project? Is that true?

All 21 comments

I think there's another problem here beyond the race: if I understand the restore operation correctly, the _results of restore_ can differ when invoked for different project graph subsets, because a supergraph can result in a different solution to the diamond-dependency-resolved closure of package references.

@rainersigwald
I'm not sure I fully follow, but I don't think that's the case.
The restore operation on each project requires the full closure of projects to be "discoverable", regardless whether those transitive projects are getting restored themselves or not.

Nice report and repro. I鈥檓 curious if this is just an issue with .net core projects or if the older package format also has this problem.

This is fixed with the new RestoreTaskEx path which can be enabled via the /p:RestoreUseStaticGraphEvaluation=true switch.

@ViktorHofer
Seems unlikely.
Static graph restore changes the msbuild part, the rest of it is mostly the same.

I'd need to investigate to confirm though.

You are right, I just hit this again with RestoreTaskEx :(

Here's the call stack:

Cannot create a file when that file already exists.
   at System.IO.FileSystem.MoveFile(String sourceFullPath, String destFullPath, Boolean overwrite)
   at System.IO.File.Move(String sourceFileName, String destFileName)
   at NuGet.Common.FileUtility.Move(String sourceFileName, String destFileName)
   at NuGet.Common.FileUtility.Replace(String sourceFileName, String destFileName)
   at NuGet.Common.FileUtility.Replace(Action`1 writeSourceFile, String destFilePath)
   at NuGet.Commands.RestoreResult.CommitAssetsFileAsync(LockFileFormat lockFileFormat, IRestoreResult result, ILogger log, Boolean toolCommit, CancellationToken token)
   at NuGet.Commands.RestoreResult.CommitAsync(ILogger log, CancellationToken token)
   at NuGet.Commands.RestoreRunner.CommitAsync(RestoreResultPair restoreResult, CancellationToken token)
   at NuGet.Commands.RestoreRunner.ExecuteAndCommitAsync(RestoreSummaryRequest summaryRequest, CancellationToken token)
   at NuGet.Commands.RestoreRunner.CompleteTaskAsync(List`1 restoreTasks)
   at NuGet.Commands.RestoreRunner.RunAsync(IEnumerable`1 restoreRequests, RestoreArgs restoreContext, CancellationToken token)
   at NuGet.Commands.RestoreRunner.RunAsync(RestoreArgs restoreContext, CancellationToken token)
   at NuGet.Build.Tasks.BuildTasksUtility.RestoreAsync(DependencyGraphSpec dependencyGraphSpec, Boolean interactive, Boolean recursive, Boolean noCache, Boolean ignoreFailedSources, Boolean disableParallel, Boolean force, Boolean forceEvaluate, Boolean hideWarningsAndErrors, Boolean restorePC, ILogger log, CancellationToken cancellationToken)
   at NuGet.Build.Tasks.Console.MSBuildStaticGraphRestore.RestoreAsync(String entryProjectFilePath, IDictionary`2 globalProperties, IReadOnlyDictionary`2 options)

Thanks @ViktorHofer That's super helpful.
We have a mechanism for locking on that but it could often be expensive.

We've focused on parallelization of package install operations cause those are a lot more infrequent.
We haven't used the expensive locking mechanism for the project specific assets.

I think you should be able to force this to happen locally with the repro that @natemcmaster posted. @natemcmaster implemented a workaround for that a year ago in Arcade but with the new RestoreTaskEx path, this doesn't work anymore which basically means we are blocked turning on the new static graph restore until this is fixed.

Let me know if I can help you debug this somehow. I also just pinged @jeffkl regarding this offline.

Ok @jeffkl pointed out that this could be related to us having multiple files with the same name, ie packages.proj which are now binclashing in the intermediate dir.

Yeah I'm pretty confident after looking at the code (which does a delete then move) that two or more projects are sharing the same restore output path which is not supported. I know Arcade defaults to using a rooted BaseIntermediateOutput path so you have to ensure that every project in the tree has a unique file name.

NuGet could add an error and fail restore if two projects collide as well

Happened again for a uniquely named project:

  Restored F:\workspace\_work\1\s\src\installer\test\Microsoft.NET.HostModel.Tests\Microsoft.NET.HostModel.AppHost.Tests\Microsoft.NET.HostModel.AppHost.Tests.csproj (in 6.04 sec).
  Restored F:\workspace\_work\1\s\src\libraries\System.Reflection.Extensions\tests\System.Reflection.Extensions.Tests.csproj (in 101 ms).
  Restored F:\workspace\_work\1\s\src\libraries\System.Reflection.Emit\tests\System.Reflection.Emit.Tests.csproj (in 67 ms).
F:\workspace\_work\1\s\.dotnet\sdk\5.0.100-preview.3.20170.6\NuGet.RestoreEx.targets(10,5): error : Cannot create a file when that file already exists. [F:\workspace\_work\1\s\src\installer\test\Microsoft.NET.HostModel.Tests\Microsoft.NET.HostModel.AppHost.Tests\Microsoft.NET.HostModel.AppHost.Tests.csproj]
F:\workspace\_work\1\s\.dotnet\sdk\5.0.100-preview.3.20170.6\NuGet.RestoreEx.targets(10,5): error :    at System.IO.FileSystem.MoveFile(String sourceFullPath, String destFullPath, Boolean overwrite) [F:\workspace\_work\1\s\src\installer\test\Microsoft.NET.HostModel.Tests\Microsoft.NET.HostModel.AppHost.Tests\Microsoft.NET.HostModel.AppHost.Tests.csproj]
F:\workspace\_work\1\s\.dotnet\sdk\5.0.100-preview.3.20170.6\NuGet.RestoreEx.targets(10,5): error :    at NuGet.Common.FileUtility.Move(String sourceFileName, String destFileName) [F:\workspace\_work\1\s\src\installer\test\Microsoft.NET.HostModel.Tests\Microsoft.NET.HostModel.AppHost.Tests\Microsoft.NET.HostModel.AppHost.Tests.csproj]
F:\workspace\_work\1\s\.dotnet\sdk\5.0.100-preview.3.20170.6\NuGet.RestoreEx.targets(10,5): error :    at NuGet.Common.FileUtility.Replace(String sourceFileName, String destFileName) [F:\workspace\_work\1\s\src\installer\test\Microsoft.NET.HostModel.Tests\Microsoft.NET.HostModel.AppHost.Tests\Microsoft.NET.HostModel.AppHost.Tests.csproj]
F:\workspace\_work\1\s\.dotnet\sdk\5.0.100-preview.3.20170.6\NuGet.RestoreEx.targets(10,5): error :    at NuGet.Common.FileUtility.Replace(Action`1 writeSourceFile, String destFilePath) [F:\workspace\_work\1\s\src\installer\test\Microsoft.NET.HostModel.Tests\Microsoft.NET.HostModel.AppHost.Tests\Microsoft.NET.HostModel.AppHost.Tests.csproj]
F:\workspace\_work\1\s\.dotnet\sdk\5.0.100-preview.3.20170.6\NuGet.RestoreEx.targets(10,5): error :    at NuGet.Commands.BuildAssetsUtils.WriteXML(String path, XDocument doc) [F:\workspace\_work\1\s\src\installer\test\Microsoft.NET.HostModel.Tests\Microsoft.NET.HostModel.AppHost.Tests\Microsoft.NET.HostModel.AppHost.Tests.csproj]
F:\workspace\_work\1\s\.dotnet\sdk\5.0.100-preview.3.20170.6\NuGet.RestoreEx.targets(10,5): error :    at NuGet.Commands.BuildAssetsUtils.WriteFiles(IEnumerable`1 files, ILogger log) [F:\workspace\_work\1\s\src\installer\test\Microsoft.NET.HostModel.Tests\Microsoft.NET.HostModel.AppHost.Tests\Microsoft.NET.HostModel.AppHost.Tests.csproj]
F:\workspace\_work\1\s\.dotnet\sdk\5.0.100-preview.3.20170.6\NuGet.RestoreEx.targets(10,5): error :    at NuGet.Commands.RestoreResult.CommitAssetsFileAsync(LockFileFormat lockFileFormat, IRestoreResult result, ILogger log, Boolean toolCommit, CancellationToken token) [F:\workspace\_work\1\s\src\installer\test\Microsoft.NET.HostModel.Tests\Microsoft.NET.HostModel.AppHost.Tests\Microsoft.NET.HostModel.AppHost.Tests.csproj]
F:\workspace\_work\1\s\.dotnet\sdk\5.0.100-preview.3.20170.6\NuGet.RestoreEx.targets(10,5): error :    at NuGet.Commands.RestoreResult.CommitAsync(ILogger log, CancellationToken token) [F:\workspace\_work\1\s\src\installer\test\Microsoft.NET.HostModel.Tests\Microsoft.NET.HostModel.AppHost.Tests\Microsoft.NET.HostModel.AppHost.Tests.csproj]
F:\workspace\_work\1\s\.dotnet\sdk\5.0.100-preview.3.20170.6\NuGet.RestoreEx.targets(10,5): error :    at NuGet.Commands.RestoreRunner.CommitAsync(RestoreResultPair restoreResult, CancellationToken token) [F:\workspace\_work\1\s\src\installer\test\Microsoft.NET.HostModel.Tests\Microsoft.NET.HostModel.AppHost.Tests\Microsoft.NET.HostModel.AppHost.Tests.csproj]
F:\workspace\_work\1\s\.dotnet\sdk\5.0.100-preview.3.20170.6\NuGet.RestoreEx.targets(10,5): error :    at NuGet.Commands.RestoreRunner.ExecuteAndCommitAsync(RestoreSummaryRequest summaryRequest, CancellationToken token) [F:\workspace\_work\1\s\src\installer\test\Microsoft.NET.HostModel.Tests\Microsoft.NET.HostModel.AppHost.Tests\Microsoft.NET.HostModel.AppHost.Tests.csproj]
F:\workspace\_work\1\s\.dotnet\sdk\5.0.100-preview.3.20170.6\NuGet.RestoreEx.targets(10,5): error :    at NuGet.Commands.RestoreRunner.CompleteTaskAsync(List`1 restoreTasks) [F:\workspace\_work\1\s\src\installer\test\Microsoft.NET.HostModel.Tests\Microsoft.NET.HostModel.AppHost.Tests\Microsoft.NET.HostModel.AppHost.Tests.csproj]
F:\workspace\_work\1\s\.dotnet\sdk\5.0.100-preview.3.20170.6\NuGet.RestoreEx.targets(10,5): error :    at NuGet.Commands.RestoreRunner.RunAsync(IEnumerable`1 restoreRequests, RestoreArgs restoreContext, CancellationToken token) [F:\workspace\_work\1\s\src\installer\test\Microsoft.NET.HostModel.Tests\Microsoft.NET.HostModel.AppHost.Tests\Microsoft.NET.HostModel.AppHost.Tests.csproj]
F:\workspace\_work\1\s\.dotnet\sdk\5.0.100-preview.3.20170.6\NuGet.RestoreEx.targets(10,5): error :    at NuGet.Commands.RestoreRunner.RunAsync(RestoreArgs restoreContext, CancellationToken token) [F:\workspace\_work\1\s\src\installer\test\Microsoft.NET.HostModel.Tests\Microsoft.NET.HostModel.AppHost.Tests\Microsoft.NET.HostModel.AppHost.Tests.csproj]
F:\workspace\_work\1\s\.dotnet\sdk\5.0.100-preview.3.20170.6\NuGet.RestoreEx.targets(10,5): error :    at NuGet.Build.Tasks.BuildTasksUtility.RestoreAsync(DependencyGraphSpec dependencyGraphSpec, Boolean interactive, Boolean recursive, Boolean noCache, Boolean ignoreFailedSources, Boolean disableParallel, Boolean force, Boolean forceEvaluate, Boolean hideWarningsAndErrors, Boolean restorePC, ILogger log, CancellationToken cancellationToken) [F:\workspace\_work\1\s\src\installer\test\Microsoft.NET.HostModel.Tests\Microsoft.NET.HostModel.AppHost.Tests\Microsoft.NET.HostModel.AppHost.Tests.csproj]
F:\workspace\_work\1\s\.dotnet\sdk\5.0.100-preview.3.20170.6\NuGet.RestoreEx.targets(10,5): error :    at NuGet.Build.Tasks.Console.MSBuildStaticGraphRestore.RestoreAsync(String entryProjectFilePath, IDictionary`2 globalProperties, IReadOnlyDictionary`2 options) [F:\workspace\_work\1\s\src\installer\test\Microsoft.NET.HostModel.Tests\Microsoft.NET.HostModel.AppHost.Tests\Microsoft.NET.HostModel.AppHost.Tests.csproj]
##[error].dotnet\sdk\5.0.100-preview.3.20170.6\NuGet.RestoreEx.targets(10,5): error : (NETCORE_ENGINEERING_TELEMETRY=Restore) Cannot create a file when that file already exists.
   at System.IO.FileSystem.MoveFile(String sourceFullPath, String destFullPath, Boolean overwrite)
   at NuGet.Common.FileUtility.Move(String sourceFileName, String destFileName)
   at NuGet.Common.FileUtility.Replace(String sourceFileName, String destFileName)
   at NuGet.Common.FileUtility.Replace(Action`1 writeSourceFile, String destFilePath)
   at NuGet.Commands.BuildAssetsUtils.WriteXML(String path, XDocument doc)
   at NuGet.Commands.BuildAssetsUtils.WriteFiles(IEnumerable`1 files, ILogger log)
   at NuGet.Commands.RestoreResult.CommitAssetsFileAsync(LockFileFormat lockFileFormat, IRestoreResult result, ILogger log, Boolean toolCommit, CancellationToken token)
   at NuGet.Commands.RestoreResult.CommitAsync(ILogger log, CancellationToken token)
   at NuGet.Commands.RestoreRunner.CommitAsync(RestoreResultPair restoreResult, CancellationToken token)
   at NuGet.Commands.RestoreRunner.ExecuteAndCommitAsync(RestoreSummaryRequest summaryRequest, CancellationToken token)
   at NuGet.Commands.RestoreRunner.CompleteTaskAsync(List`1 restoreTasks)
   at NuGet.Commands.RestoreRunner.RunAsync(IEnumerable`1 restoreRequests, RestoreArgs restoreContext, CancellationToken token)
   at NuGet.Commands.RestoreRunner.RunAsync(RestoreArgs restoreContext, CancellationToken token)
   at NuGet.Build.Tasks.BuildTasksUtility.RestoreAsync(DependencyGraphSpec dependencyGraphSpec, Boolean interactive, Boolean recursive, Boolean noCache, Boolean ignoreFailedSources, Boolean disableParallel, Boolean force, Boolean forceEvaluate, Boolean hideWarningsAndErrors, Boolean restorePC, ILogger log, CancellationToken cancellationToken)
   at NuGet.Build.Tasks.Console.MSBuildStaticGraphRestore.RestoreAsync(String entryProjectFilePath, IDictionary`2 globalProperties, IReadOnlyDictionary`2 options)

https://dev.azure.com/dnceng/public/_build/results?buildId=575356&view=logs

Definitely looks like a race as you can see that the same project was restored above.

It kind of looks like multiple restores are happening at the same time for the same project? Is that true?

Those are transitive dependencies, ie AppHost.Bundle.Tests.csproj references Microsoft.NET.HostModel.csproj which then gets restored multiple times:

  Determining projects to restore...
  Determining projects to restore...
  Determining projects to restore...
  Determining projects to restore...
  Determining projects to restore...
  Determining projects to restore...
  Restored C:\git\runtime4\src\installer\managed\Microsoft.NET.HostModel\Microsoft.NET.HostModel.csproj (in 672 ms).
  Restored C:\git\runtime4\src\installer\managed\Microsoft.NET.HostModel\Microsoft.NET.HostModel.csproj (in 906 ms).
  Restored C:\git\runtime4\src\installer\managed\Microsoft.NET.HostModel\Microsoft.NET.HostModel.csproj (in 1.15 sec).
  Restored C:\git\runtime4\src\installer\managed\Microsoft.NET.HostModel\Microsoft.NET.HostModel.csproj (in 937 ms).
  Restored C:\git\runtime4\src\installer\managed\Microsoft.NET.HostModel\Microsoft.NET.HostModel.csproj (in 1.23 sec).

So restoring entrypoints with same dependencies in parallel leads to the issue that we are seeing here. The obvious solution here is to move to a single restore container instead of restoring in parallel. I'll see what I can do about that in dotnet/runtime.

@nkolev92 there's parts of NuGet that support multiple restores at the same time, but I don't think we support restoring the same project at the same time right?

@ViktorHofer one massive restore for the whole repo is sort of how retail NuGet is designed. My static graph evaluation improvements are meant to speed up the MSBuild side of things by loading projects faster. I made other improvements to the NuGet side to speed up no-op restores as well. I'm not sure if that's what was the goal of restoring portions of the tree in parallel, but hopefully one massive restore will be fast and not have any race conditions 馃憤

there's parts of NuGet that support multiple restores at the same time, but I don't think we support restoring the same project at the same time right

Not really. The shared parts are synchronized, global packages folder, http cache, plugin cache, temp cache, but we never synchronize the project intermediates.
Process wide synchronization is expensive so we avoid it unless absolutely necessary.

This problem also occurs when running multiple build tasks in Visual Studio Code. I define a "Build All" task in tasks.json with a "dependsOn" property that contains multiple other tasks. Two of these tasks load the same NuGet package. If that package is not currently in the cache, one of the dependent tasks will fail with this error.

Two of these tasks load the same NuGet package. If that package is not currently in the cache, one of the dependent tasks will fail with this error.

Hey @johnpankowicz if you are referring to usage of packages from the global packages folder, then it's not something that NuGet monitors.
NuGet guards the installation in the global packages folder with process wide locks.

I define a "Build All" task in tasks.json with a "dependsOn" property that contains multiple other tasks.

Are these tasks running multiple restores?

Is dotnet restore getting called explicitly?

@nkolev92 - I am referring to the global packages folder. I deleted the contents to test what happens when someone clones my Github solution and has none of my NuGet packages installed.
Each of the dependent tasks just do a "dotnet build" of a project that has multiple NuGet package references. "dotnet restore" is not being run explicitly. Many of the projects reference the same Nuget packages. I need to run the "Build All" task usually twice to get around this error.

@johnpankowicz
I'm not familiar with the details of how VS Code builds, but whenever you call dotnet build restore is run automatically and will detect packages missing from the global packages folder if necessary.

If you can repro any dotnet build or dotnet restore not getting the correct packages, we'd be happy to take a look at that (best done in a separate issue).

Was this page helpful?
0 / 5 - 0 ratings