Msbuild: Understanding the Clean target

Created on 8 Aug 2017  路  4Comments  路  Source: dotnet/msbuild

Tried a recent experiment in the Roslyn repo:

> git clean -dxf .
> msbuild /t:restore /v:m /m Roslyn.sln
...
> msbuild /t:build /v:m /m Roslyn.sln
...
> msbuild /t:clean /v:m /m Roslyn.sln

My expectation is that once this is complete both my Binaries\Debug directory should be essentially empty. Possibly some directories around but all content like .dll / .exe files removed. Instead what I see is 9,000+ .dll / .exe files in my Binaries\Debug directory.

The Roslyn build is correct to our understanding and mostly devoid of hacks around build output. My mental model is Clean is the reverse of Build and hence the above should leave my build artifact free. Given that Clean doesn't actually clean, the only safe operation our devs can do for rebuild is essentially delete Binaries\Debug.

Is my mental model wrong here? Or are there likely bugs in the build that we need to track down to ensure Clean actually cleans?

question

Most helpful comment

What Clean does and how

That mental model doesn't capture the nuances of the current implementation of Clean, which is pretty complex.

The intuitive goal of the Clean target is to delete everything produced by previous non-clean runs. That goal is hard to achieve because the MSBuild engine doesn't have enough information to know what "the outputs of the build" are. A task implementation can create arbitrarily many files, and build input parameters or other state can cause different outputs to be produced.

To deal with that, the common targets implement an honor-system method of tracking the output of "the last build". Well-behaved targets emit their outputs into an item named @(FileWrites), which is serialized to $(CleanFile) in the obj directory (it ends with .FileListAbsolute.txt) in a target named _CleanRecordFileWrites. Clean can then read that list and delete files in it during a subsequent MSBuild invocation.

That is made more complicated by the possibility of incremental builds--builds that _would_ have written a file, but it was up to date, so _this build_ didn't do so. But you still want to delete those build outputs on Clean! So there's logic in common targets to preserve the list of files that were written last time, unless they really shouldn't be present any more (because you changed something). That's the AndPrior part of _CleanGetCurrentAndPriorFileWrites.

The next complicating factor is the possibility that multiple projects might be built to the same output directory. In that case, cleaning a single project should _not_ delete the outputs of other projects. But it would, because some references get copied to the output folder. To account for this, there's a second item group @(FileWritesShareable). That list is treated specially: only items from it that are believed to be unique to this project are written to the $(CleanFile) for later deletion.

The Roslyn problem at hand

I captured logs of the builds you described and looked through them to figure out what the problem was.

The files that don't get deleted are added to @(FileWritesShareable) in the Build invocation, but don't get serialized to $(CleanFile). That's because the heuristic used for "this 'Shareable' output is unique to this project" is "output is under the project file's directory".

That's not a very good heuristic, as evidenced by your case. I bet we can do better--I filed #2410--but we probably can't do so in an update.

Possible workarounds

  • Go with the flow and don't use a dedicated output folder (yes, this is worse in many ways)
  • Patch up the @(FileWrites) list with a custom target, since you know it's safe to include @(FileWritesShareable). Untested:
<Target Name="EnsureDeletionOfCopiedReferences"
        BeforeTargets="_CleanRecordFileWrites"
        DependsOnTargets="_CleanGetCurrentAndPriorFileWrites">
  <!-- Work around https://github.com/Microsoft/msbuild/issues/2410 -->
  <ItemGroup>
    <_PreviousCurrentFileWrites Include="@(_CleanCurrentFileWrites)" />
    <_CleanCurrentFileWrites Remove="@(_CleanCurrentFileWrites)" />
  </ItemGroup>
  <!-- Remove duplicates from files produced in this build. -->
  <RemoveDuplicates Inputs="@(_PreviousCurrentFileWrites);@(FileWritesShareable)" >
    <Output TaskParameter="Filtered" ItemName="_CleanCurrentFileWrites"/>
  </RemoveDuplicates>
</Target>

All 4 comments

What Clean does and how

That mental model doesn't capture the nuances of the current implementation of Clean, which is pretty complex.

The intuitive goal of the Clean target is to delete everything produced by previous non-clean runs. That goal is hard to achieve because the MSBuild engine doesn't have enough information to know what "the outputs of the build" are. A task implementation can create arbitrarily many files, and build input parameters or other state can cause different outputs to be produced.

To deal with that, the common targets implement an honor-system method of tracking the output of "the last build". Well-behaved targets emit their outputs into an item named @(FileWrites), which is serialized to $(CleanFile) in the obj directory (it ends with .FileListAbsolute.txt) in a target named _CleanRecordFileWrites. Clean can then read that list and delete files in it during a subsequent MSBuild invocation.

That is made more complicated by the possibility of incremental builds--builds that _would_ have written a file, but it was up to date, so _this build_ didn't do so. But you still want to delete those build outputs on Clean! So there's logic in common targets to preserve the list of files that were written last time, unless they really shouldn't be present any more (because you changed something). That's the AndPrior part of _CleanGetCurrentAndPriorFileWrites.

The next complicating factor is the possibility that multiple projects might be built to the same output directory. In that case, cleaning a single project should _not_ delete the outputs of other projects. But it would, because some references get copied to the output folder. To account for this, there's a second item group @(FileWritesShareable). That list is treated specially: only items from it that are believed to be unique to this project are written to the $(CleanFile) for later deletion.

The Roslyn problem at hand

I captured logs of the builds you described and looked through them to figure out what the problem was.

The files that don't get deleted are added to @(FileWritesShareable) in the Build invocation, but don't get serialized to $(CleanFile). That's because the heuristic used for "this 'Shareable' output is unique to this project" is "output is under the project file's directory".

That's not a very good heuristic, as evidenced by your case. I bet we can do better--I filed #2410--but we probably can't do so in an update.

Possible workarounds

  • Go with the flow and don't use a dedicated output folder (yes, this is worse in many ways)
  • Patch up the @(FileWrites) list with a custom target, since you know it's safe to include @(FileWritesShareable). Untested:
<Target Name="EnsureDeletionOfCopiedReferences"
        BeforeTargets="_CleanRecordFileWrites"
        DependsOnTargets="_CleanGetCurrentAndPriorFileWrites">
  <!-- Work around https://github.com/Microsoft/msbuild/issues/2410 -->
  <ItemGroup>
    <_PreviousCurrentFileWrites Include="@(_CleanCurrentFileWrites)" />
    <_CleanCurrentFileWrites Remove="@(_CleanCurrentFileWrites)" />
  </ItemGroup>
  <!-- Remove duplicates from files produced in this build. -->
  <RemoveDuplicates Inputs="@(_PreviousCurrentFileWrites);@(FileWritesShareable)" >
    <Output TaskParameter="Filtered" ItemName="_CleanCurrentFileWrites"/>
  </RemoveDuplicates>
</Target>

Okay ... that pretty much destroyed my mental model 馃槃

Thanks for the detailed explanation here. Really helps.

@rainersigwald is there a way to remove the empty folder? I realize clean can remove the files. But it will not remove the folder, so there are many empty folders left.

@wli3 you can try to play with RemoveDir task

Was this page helpful?
0 / 5 - 0 ratings