Roslyn: [Umbrella] Compilers should be deterministic: same inputs generate same outputs

Created on 10 Feb 2015  路  28Comments  路  Source: dotnet/roslyn

This is an umbrella issue for making the Roslyn compilers deterministic. See also Open Issues for Determinism.

There are a few issues:

  • [X] #223 Anonymous types should be output in a deterministic order.
  • [X] #360 The string heap should be output in a deterministic order.
  • [ ] #375 Decimal parsing reflects bugs in the BCL; fixed in Core 3.
  • [X] #803 The order of symbols produced from the DataFlowAnalysis API should be deterministic.
  • [x] #926 Produce PDBs with deterministic GUIDs and timestamps by default. This requires a CLR spec change; see https://github.com/dotnet/coreclr/issues/1615. The CLR spec has been agreed internally and implemented in Roslyn, but the new spec has not been published and this is not enabled by default.
  • [x] #949 Produce only relative paths in caller file name arguments and in the PDB
  • [X] #1228 Netmodules should generate per-assembly unique names for PrivateImplementationDetails
  • [x] #1319 VB Anonymous types should generate per-assembly unique names for netmodules
  • [X] #1428 C# should mangle module name uniquely to produce anonymous types.
  • [X] #1430 C# PrivateImplementationDetails should not have "." in its name
  • [X] #1440 Ordering of synthesized delegates in metadata should be deterministic
  • [ ] #1506 Diagnostics should be returned in a deterministic order
  • [x] #2184 Make the command-line compiler capable of generating a reference assembly

    • [x] #17612 Refine what is in reference assemblies and what diagnostics prevent generating one

  • [x] #2303 Add support for /features:deterministic for msbuild (fixes timestamp and MVID)
  • [x] #4171 Attributes are sometimes emitted in nondeterministic order
  • [x] #4172 VB static locals emitted to fields in nondeterministic order
  • [x] #4221 Roslyn parses double values differently on x32 vs x64
  • [x] #4578 Auto-generated assembly version is nondeterministic
  • [x] #5070 Document what compiler inputs affect its deterministic output
  • [x] #7111 VB interfaces not emitted it a deterministic order across partial types
  • [x] #7112 VB shared field initializers should be emitted in source order across partial types
  • [x] #7262 Constant folding uses wrong precision
  • [x] #7595 Small method body cache nondeterministic (see also #17052)
  • [ ] #8703 Review usage of random guid in Emit/ErrorType.cs
  • [x] #9813 Need a mechanism to cause file-relative PDB path to be included in the assembly.
  • [ ] #10321 Document compiler /determinism effect on generated assembly
  • [ ] #10858 Enumerate files deterministically when compiling *.cs or *.vb
  • [x] #11015 Symbol.Locations has indeterministic ordering of locations between compilations with same inputs
  • [x] #11990 The order in which type forwarders are emitted to ExportedType table is non-deterministic
  • [ ] #17121 Document compiler inputs for determinism (for cacheability)
  • [x] #23020 Generation of the GetHashCode method of anonymous types is not deterministic
  • [x] #30439 Produce the same NaN bits on different platforms
  • [x] #37527 Constant folding produces different IL depending on host architecture
  • [ ] #23969 Diagnose incomplete pathmap
Area-Compilers Concept-Determinism Feature Request Language-C# Language-VB Story

Most helpful comment

Another important reason is that with a deterministic or reproducible build you can now verify that the binary you got from somewhere (e.g. NuGet) is really built from the source code you have access to and wasn't modified/tampered with.

Debian is doing a push to move most of its packages to be reproducible: https://wiki.debian.org/ReproducibleBuilds

The Tor project also put a lot of time into this: https://blog.torproject.org/blog/deterministic-builds-part-one-cyberwar-and-global-compromise

All 28 comments

@gafter Would you be so kind to explain why making the compilers deterministic is important? You are probably envisioning scenario's I'm not aware of. I'm just curious :)

Not that I would claim to know the mind of @gafter, but consider a continuous build system which uses the output of one stage to know whether or not it has to recompile other source. If the same input always produces the exact same output, you can easily tell what binaries have _actually_ been affected by a change. If the binaries will always change anyway (because they include timestamps, random GUIDs etc) then you can't do this.

Another important reason is that with a deterministic or reproducible build you can now verify that the binary you got from somewhere (e.g. NuGet) is really built from the source code you have access to and wasn't modified/tampered with.

Debian is doing a push to move most of its packages to be reproducible: https://wiki.debian.org/ReproducibleBuilds

The Tor project also put a lot of time into this: https://blog.torproject.org/blog/deterministic-builds-part-one-cyberwar-and-global-compromise

Thanks for sharing, greatly appreciated!

Yeah, what @jskeet and @akoeplinger said.

926 will not be addressed in Update 1. Aiming for the next full release.

1506 will not be addressed in Update 1. Aiming for Update 2.

2184 might be addressed in Update 2.

Just my 2 cents - the msbuild flags /deterministic and /pathmap looks like already implemented based on the checklist, but they are not documented anywhere. Any chance to fix it?
https://github.com/MicrosoftDocs/visualstudio-docs/issues/361

Thanks @martinsuchan for pointing out those issues.

Documenting /pathmap is tracked by https://github.com/dotnet/docs/issues/1800 (do chime there to voice your interest).

I filed another documentation issue just now for /deterministic at https://github.com/dotnet/docs/issues/3828

Why is /deterministic flag an optional thing in many compilers; is there a penalty / cost attached can be avoided without csc /deterministic? Otherwise, if it is all about bringing goodness then does it has to be optional, hidden behind a flag?

@kasper3 I think the only reason why determinism isn't the default is to avoid surprising customers, who have been using the compiler (without determinism) for a long time.
Deterministic assemblies have strange timestamps, which would be surprising to many if the default was changed. Also determinism prohibits the use of wildcard in assembly version.
That's why customers have to make an explicit choice to turn determinism on.

@kasper3 @jcouv BTW, /deterministic is the default for .Net Core SDK projects, so over time, it should be the default for more and more new projects.

@svick Good point.
Filed https://github.com/dotnet/project-system/issues/3438 to update the desktop templates (so at least newly created projects can used determinism).

@jcouv, the still-unchecked items in the first post can be revisited. Some of them are closed issues (some are discussion-going-nowhere open issues).

Wondering, are there any plans to make UWP app builds deterministic by default? Timestamps and wildcards in assembly versions are not really important when publishing apps to Store.
Last time I was testing deterministic builds for UWP apps it was not possible to produce identical packages because WinMDExp and MakePri.exe tools from the Windows SDK build pipeline don't support producing deterministic outputs yet.
Also since Microsoft Store uses differential updates for downloading new packages, having deterministic builds by default could save tons of Store traffic basically for free.

Tagging @MichalStrehovsky @sergiy-k for UWP question.

This came up in the context of .NET Native in the past (#23456) but I couldn't find the owners of the tools in question. Maybe @tarekgh would know for MakePri?

@axelandrejs can help answering MakePri question.

MakePri is expected to produce identical output for identical input. Now, it is possible that some of the inputs change unexpectedly, since MakePri parses the file system. If you have two files that were produced over the same set of inputs but are not identical, can you share the files?

Note that if you are implementing your own build pipeline, we now have a programmatic equivalent to MakePri. See https://msdn.microsoft.com/en-us/library/windows/desktop/mt845690(v=vs.85).aspx . It avoids dependencies on the file system and as such takes out some of the potential non-determinism. Also, it's faster. We are developing a tool that supports specifying all the data needed to build a UWP resources file via a relatively simple XML, in case that would help.

Does anyone know if resgen.exe is deterministic?

@axelandrejs @tarekgh Reviving the discussion, MakePri does not produce deterministic outputs, nor WinMDExp, just checked again in VS 15.7.4. There is a simple repro solution available in #23456. Just build the solution twice in the same clean folder and extract all appxupload/appx files in it.
The expected result is identical content of both folders, the reality is differences in resources.pri and Winrt.Component.winmd.
Any chance to find the product owners of those tools and make them deterministic by default?

Files with different checksum:

AppxBlockMap.xml
AppxSignature.p7x
ReproTest_1.0.0.0_x86.appx
AppxMetadataAppxBundleManifest.xml
ReproTest_1.0.0.0_x86AppxBlockMap.xml
ReproTest_1.0.0.0_x86AppxSignature.p7x
ReproTest_1.0.0.0_x86resources.pri
ReproTest_1.0.0.0_x86Winrt.Component.winmd
ReproTest_1.0.0.0_x86AppxMetadataCodeIntegrity.cat

@axelandrejs may be able to help better here.

The problem is not actually makepri. The problem are the XBF files, specifically App.xbf and MainPage.xbf. They are different between runs.

By default XAML files are embedded as BLOBs into the PRI files to improve performance. This means that if the XBF files change, you will see the difference in the resources.pri file. You can see for yourself. You can dump the contents of the resoures.pri file via the following command.

makepri dump /dt detailed /if [the resoures.pri file] /of [some .xml file name]. I did this for the two files you had in ReproTest.build1 and ReproTest.build2. You can see the the App.xbf contents below.

@LarryOsterman can speak to the .winmd differences.
@jevansaks can speak to the .xbf differences.

WEJGAJYBAABaAAAAAgAAAAEAAAB4AAAAAAAAAHYBAAAAAAAAegEAAAAAAAB+AQAAAAAAAIIBAAAAAAAAhgEAAAAAAAAzNjZCNDlDODVFRjdEOTIyQkRERTQ4RTA2MzY3MUJCMQAICAAzAGEAEiodUrgWAIzg/iADAQAAAAAAAAAAAAAAAwAAADkAAABoAHQAdABwADoALwAvAHMAYwBoAGUAbQBhAHMALgBtAGkAYwByAG8AcwBvAGYAdAAuAGMAbwBtAC8AdwBpAG4AZgB4AC8AMgAwADAANgAvAHgAYQBtAGwALwBwAHIAZQBzAGUAbgB0AGEAdABpAG8AbgAAACwAAABoAHQAdABwADoALwAvAHMAYwBoAGUAbQBhAHMALgBtAGkAYwByAG8AcwBvAGYAdAAuAGMAbwBtAC8AdwBpAG4AZgB4AC8AMgAwADAANgAvAHgAYQBtAGwAAAAPAAAAdQBzAGkAbgBnADoAUgBlAHAAcgBvAFQAZQBzAHQAAAAAAAAAAAAAAAAAAAAAAAAAAwAAAAAAAAABAAAAAgAAAAEAAAAAAAAATgAAABIAAAAAAAADAQABAAAAeAADAgAFAAAAbABvAGMAYQBsAAsNAAAAUgBlAHAAcgBvAFQAZQBzAHQALgBBAHAAcAAXH4AaSIALQgIAAAAAIQ==

VERSUS

WEJGAJYBAABaAAAAAgAAAAEAAAB4AAAAAAAAAHYBAAAAAAAAegEAAAAAAAB+AQAAAAAAAIIBAAAAAAAAhgEAAAAAAAAzNjZCNDlDODVFRjdEOTIyQkRERTQ4RTA2MzY3MUJCMQAICAAAAAAAHQT6JAASAIpTAHkAcwB0AGUAbQAuAE8AAwAAADkAAABoAHQAdABwADoALwAvAHMAYwBoAGUAbQBhAHMALgBtAGkAYwByAG8AcwBvAGYAdAAuAGMAbwBtAC8AdwBpAG4AZgB4AC8AMgAwADAANgAvAHgAYQBtAGwALwBwAHIAZQBzAGUAbgB0AGEAdABpAG8AbgAAACwAAABoAHQAdABwADoALwAvAHMAYwBoAGUAbQBhAHMALgBtAGkAYwByAG8AcwBvAGYAdAAuAGMAbwBtAC8AdwBpAG4AZgB4AC8AMgAwADAANgAvAHgAYQBtAGwAAAAPAAAAdQBzAGkAbgBnADoAUgBlAHAAcgBvAFQAZQBzAHQAAAAAAAAAAAAAAAAAAAAAAAAAAwAAAAAAAAABAAAAAgAAAAEAAAAAAAAATgAAABIAAAAAAAADAQABAAAAeAADAgAFAAAAbABvAGMAYQBsAAsNAAAAUgBlAHAAcgBvAFQAZQBzAHQALgBBAHAAcAAXH4AaSIALQgIAAAAAIQ==

Added #375 to the list of issues.

@gafter, just as an FYI. decimal parsing should now respect all digits given in netcoreapp3.0 and forward.

@tannergooding Thank you. That just confirms that different runtimes do it differently from each other.

Is there decimal parsing code that we could copy or adapt into Roslyn?

@gafter, yes. The logic lives here: https://source.dot.net/#System.Private.CoreLib/shared/System/Number.Parsing.cs,f156a872d71c54fd,references

Most of this is similar to the floating-point parsing code Roslyn already has. That is TryStringToNumber converts the string into a digit buffer and scale (CoreFX also tracks the sign, but I believe Roslyn handles that separately).

It then converts that into the actual decimal metadata in TryNumberToDecimal, which is also where the rounding and additional digit considerations occur.

When i was first added to this issue I wasn't on GitHub. But I am now and GitHub started notifying me of new comments. :) For the XBF issue, are you still seeing it? I can file an issue for the XAML compiler team to take a look.

@jevansaks I've added repro for deterministic UWP app building here https://github.com/dotnet/roslyn/issues/23456
I'm quite sure it wasn't fixed yet.

Was this page helpful?
0 / 5 - 0 ratings