Msbuild: 64-bit node processes load MSBuild binaries from 32-bit location

Created on 26 May 2020  路  9Comments  路  Source: dotnet/msbuild

Have a 64-bit process that uses MSBuild locator to load 64-bit MSBuild and do a multiproc build. It will create MSBuild.exe processes from C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\MSBuild\Current\Bin\amd64\MSBuild.exe, but those processes will load MSBuild binaries from: C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\MSBuild\Current\Bin\Microsoft.Build.Framework.dll

Since the central process is loading binaries from C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\MSBuild\Current\Bin\amd64\Microsoft.Build.Framework.dll, there's a mismatch which results in an exception:

Microsoft.Build.Exceptions.BuildAbortedException: Build was canceled. Failed to successfully launch or connect to a child MSBuild.exe process. Verify that the MSBuild.exe "C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\MSBuild\Current\Bin\amd64\MSBuild.exe" launches successfully, and that it is loading the same microsoft.build.dll that the launching process loaded. If the location seems incorrect, try specifying the correct location in the BuildParameters object, or with the MSBUILD_EXE_PATH environment variable.
   at bool Microsoft.Build.BackEnd.NodeProviderOutOfProc.CreateNode(int nodeId, INodePacketFactory factory, NodeConfiguration configuration)
   at string Microsoft.Ide.Shell.ExceptionLogger.GetExceptionText(Exception ex) in C:/Ide/src/Microsoft.Ide.Shell/Exceptions/ExceptionLogger.cs:line 258
   at Boxed System.Lazy<T>.CreateValue()
   at T System.Lazy<T>.LazyInitValue()
   at void Microsoft.Ide.Shell.ExceptionLogger.Report(Exception ex, string title, ExceptionKind exceptionKind) in C:/Ide/src/Microsoft.Ide.Shell/Exceptions/ExceptionLogger.cs:line 127
   at bool Microsoft.Build.BackEnd.NodeProviderOutOfProc.CreateNode(int nodeId, INodePacketFactory factory, NodeConfiguration configuration)
   at int Microsoft.Build.BackEnd.NodeManager.AttemptCreateNode(INodeProvider nodeProvider, NodeConfiguration nodeConfiguration)
   at NodeInfo Microsoft.Build.BackEnd.NodeManager.CreateNode(NodeConfiguration configuration, NodeAffinity nodeAffinity)
   at void Microsoft.Build.Execution.BuildManager.PerformSchedulingActions(IEnumerable<ScheduleResponse> responses)
   at void Microsoft.Build.Execution.BuildManager.IssueBuildSubmissionToScheduler(BuildSubmission submission, bool allowMainThreadBuild)
   at void Microsoft.Build.Execution.BuildManager.ExecuteSubmission(BuildSubmission submission, bool allowMainThreadBuild)+() => { }
   at void Microsoft.Build.Execution.BuildManager.ProcessWorkQueue(Action action)

This started happening after an update to VS 16.6.0. It was not happening with VS 16.5.4 or earlier.

regression

Most helpful comment

OK I've found the problem. The handshake process uses a long value for client and host, that is derived from... drumroll... the LastWriteTime of Microsoft.Build.dll:

https://github.com/microsoft/msbuild/blob/ba9a1d64a7abf15a8505827c00413156a3eb7f62/src/Build/Resources/Constants.cs#L171

If you are a 64-bit process that hosts MSBuild and you're using it to spawn node processes, which are also 64-bit, you're going to run into this issue. The node processes still load Microsoft.Build.dll from the 32-bit location. Your central process loads Microsoft.Build.dll from amd64. If the timestamp on both files is different, the nodes won't be able to connect.

Apparently we've been lucky all this time (e.g. 16.5.4) because the timestamp on both files was the same.

However now in 16.6 they're off by 0:00:00:02.8585535 seconds.
image

All 9 comments

I've changed the 64-bit process to load Microsoft.Build.dll from the 32-bit location and all is well again.

But why did it only break recently? How did it work fine all this time?

image

never mind, it still doesn't work. If I load Microsoft.Build.dll from the 32-bit location, it starts 32-bit nodes and also fails. Not sure why I thought it was intermittently working.

OK I've found the problem. The handshake process uses a long value for client and host, that is derived from... drumroll... the LastWriteTime of Microsoft.Build.dll:

https://github.com/microsoft/msbuild/blob/ba9a1d64a7abf15a8505827c00413156a3eb7f62/src/Build/Resources/Constants.cs#L171

If you are a 64-bit process that hosts MSBuild and you're using it to spawn node processes, which are also 64-bit, you're going to run into this issue. The node processes still load Microsoft.Build.dll from the 32-bit location. Your central process loads Microsoft.Build.dll from amd64. If the timestamp on both files is different, the nodes won't be able to connect.

Apparently we've been lucky all this time (e.g. 16.5.4) because the timestamp on both files was the same.

However now in 16.6 they're off by 0:00:00:02.8585535 seconds.
image

@KirillOsenkov that is some very good detective work

Now I'm not even sure if it's a regression or not. On one hand, nothing changed in the source in the past 5 years that caused this (it's always been like this). On the other hand, the timestamps are now different due to a fluke in copying speeds and timings, so it is now broken, but it wasn't broken before?

Here are a couple PRs that touch this area recently but didn't cause this:
https://github.com/microsoft/msbuild/pull/4162
https://github.com/microsoft/msbuild/pull/5196

Here's one way that I think could fix it:
https://github.com/microsoft/msbuild/pull/5379

Another one would be something like this:

Path.Combine(BuildEnvironmentHelper.Instance.MSBuildToolsDirectory32, "Microsoft.Build.dll").ToLowerInvariant().GetHashCode()

Would this be an appropriate use of the module version id? (Edit: I realized later it wouldn't be very useful in this scenario.)

Roslyn uses it to check if the analyzer assembly it has loaded matches the assembly on disk.


Path.Combine(BuildEnvironmentHelper.Instance.MSBuildToolsDirectory32, "Microsoft.Build.dll").ToLowerInvariant().GetHashCode()

String hash codes are randomized per appdomain, so this wouldn't work unless I'm missing something.

Yes, MVID is ideal for this scenario, I collect various ways to read the MVID here:
https://github.com/KirillOsenkov/MetadataTools/blob/master/README.md#reading-an-assembly-mvid

Yes, you're right, string.GetHashCode() wouldn't work here, so we'd need MD5 or SHA1/256.

Hopefully my PR 5379 is good enough, as it reads the Git SHA from the AssemblyInformationalVersion. Probably as good as the MVID.

Was this page helpful?
0 / 5 - 0 ratings