Sdk: Discussion: Change default DebugType to embedded for .NET Core 3

Created on 19 Nov 2018  Ā·  37Comments  Ā·  Source: dotnet/sdk

Inspired from this issue and twitter thread
https://github.com/dotnet/docs/pull/9110
https://twitter.com/rrelyea/status/1064587440935956482

I want to propose that the default DebugType in the SDK change from portable to embedded. I would also propose that Sourcelink be included/used by default.

At the very least, there should be one property ConfigureOpenSourceDefaults that turns these on as best practices.

My rationale is driven by the pit of success. I believe that out-of-the-box, developers should produce fully debuggable code. This is similar to how JS typically generates/includes sourcemaps by default, so there is precedent.

Few notes:

  • Embedding a pdb yields a file size that's 20-30% bigger.
  • Some .NET tools, like .NET Native do not support symbol server; the symbols need to be alongside the dll
  • Symbol server significantly slows down VS debug startup as it probes for files

Some things like Appx/MSIX packaging strip-out pdb files by default. They rely on extra properties (AppxPackageIncludePrivateSymbols) to include them, which I'm sure 99% of people wouldn't know to add. Without them, the stack traces aren't as useful.

Symbol servers may be blocked by corporate firewalls, or otherwise affected by low-bandwidth connections (think airplanes, trains, mobile hotspots, and anywhere else with slower/spotty connections).

I believe that the extra size of the files with embedded pdb's is worth the trade-off for most cases as it presents the best debugging experience without any extra configuration. It just works.

Thanks

Most helpful comment

I want to add another angle here: I maintain an exceptions logging library, we deal with exceptions, and we get issues reported from exceptions. Stack traces are golden. They're incredibly useful. I strongly believe default experience for any app should have stack traces include the file paths and line numbers. For this to happen the deployment platform must be able to read the symbols, and they have to be there. With SourceLink, it even means the exceptions have the git hash in the URL path, pointing directly to the exact code throwing. That's awesome. But only if we get it. Without users having symbols in the first place it's all wasted effort.

That rich experience should be the default. If size is a concern, I propose we focus on stripping what users don't want in a publish phase rather than having options to the detriment of many up front. If someone is concerned about mobile or container sizes, what are the chance we could allow a slimming phase to happen as Xamarin can today?

I don't see how we can make deployment choices for everyone up at the package level. That's a deployment concern and a deployment preference. Why is someone's small footprint deployment priority governing by debugging and error logging experience? The only way to work around this as a package author (as far as I'm aware) is to force the PDB files in the package or to switch to embedded symbols. But both only fulfill me using my own libraries and take the control out of everyone who's trying to do that small deployment and is okay with sacrificing that functionality. For this reason, I really suggest focus be on splitting deployment concerns into another phase than "how packages are hosted".

There's a related issue up there where embedded seems like the best long-term approach, but last we tested .NET Full Framework wouldn't read the symbols and render the data in stack traces. That's something that needs fixing if not already done in the .NET 4.x line.

All 37 comments

I can confirm the slow symbol server load is super painful, I have a related dev community issue open on it. If someone is on latest previews which means fewer cache hits, it's even worse. I'm not saying optimize for this, only that the case exists and we hit it often.

@tmat

IMHO, we should not change the default. It's basically breaking to add 20-30% size. This is not just on disk, but can also impact (virtual) memory usage.

The "very least" option is something we can consider. But I'm not sure about how it would pull in source link. How would we know which provider to use? And even if we did, how would we determine the package version?

For sourcelink, you can add all of them and it'll use the right one. In fact it's the recommended approach for dealing with multiple hosts. Sourcelink could be bundled in the SDK like NuGet, and the others.

I think the idea of an opt-in ConfigureOpenSourceDefaults property gives the SDK a chance to provide actionable best practices that people don't have to think too hard about. That doesn't change the default, but makes it really easy for developers to follow the guidance for the scenario.

I want to add another angle here: I maintain an exceptions logging library, we deal with exceptions, and we get issues reported from exceptions. Stack traces are golden. They're incredibly useful. I strongly believe default experience for any app should have stack traces include the file paths and line numbers. For this to happen the deployment platform must be able to read the symbols, and they have to be there. With SourceLink, it even means the exceptions have the git hash in the URL path, pointing directly to the exact code throwing. That's awesome. But only if we get it. Without users having symbols in the first place it's all wasted effort.

That rich experience should be the default. If size is a concern, I propose we focus on stripping what users don't want in a publish phase rather than having options to the detriment of many up front. If someone is concerned about mobile or container sizes, what are the chance we could allow a slimming phase to happen as Xamarin can today?

I don't see how we can make deployment choices for everyone up at the package level. That's a deployment concern and a deployment preference. Why is someone's small footprint deployment priority governing by debugging and error logging experience? The only way to work around this as a package author (as far as I'm aware) is to force the PDB files in the package or to switch to embedded symbols. But both only fulfill me using my own libraries and take the control out of everyone who's trying to do that small deployment and is okay with sacrificing that functionality. For this reason, I really suggest focus be on splitting deployment concerns into another phase than "how packages are hosted".

There's a related issue up there where embedded seems like the best long-term approach, but last we tested .NET Full Framework wouldn't read the symbols and render the data in stack traces. That's something that needs fixing if not already done in the .NET 4.x line.

For sourcelink, you can add all of them and it'll use the right one.

How much overhead does that add to build for the ones I'm not using.

In fact it's the recommended approach for dealing with multiple hosts.

Where is this recommended? I can see discussion of supporting mixed submodules, but nothing that says just ref them all.

I would vote for external PDB-Files by default with sourcelink as an optional feature. NuPKGs should contain PDB-Files that can be filtered with the known asset metadata in packagereferences.

<ItemGroup>
    <PackageReference Include="Contoso.Utility.UsefulStuff" Version="3.6.0">
        <ExcludeAssets>Symbols</ExcludeAssets>
    </PackageReference>
</ItemGroup>

Maybe it'd be possible to make it Debug/Release Config dependent whether or not symbols are included.

@nguerrera

For sourcelink, you can add all of them and it'll use the right one.

How much overhead does that add to build for the ones I'm not using.

Not much really. Sourcelink tries to retrieve git remote origin and puts that url through a parser and sees what fits. Getting the current head is also not much effort.

In addition to the concern around size increase being essentially breaking, anything that discloses new source information in binary is absolutely breaking.

I mean breaking if defaults change.

Opt-in via a single property then, solves the defaults issue.

I would vote for external PDB-Files by default with sourcelink as an optional feature

This is status quo for those, yes?

Opt-in via a property then, solves the defaults issue.

Please don't call it "ConfigureOpenSourceDefaults", though. Employers of the world will lose their collective s**, if i start "enabling open source" just to link PDB with private repositories.

This is status quo for those, yes?

Yes, and that's what I'm trying to change, at least with a single property to opt-in to a better stance for OSS libraries.

Please don't call it "ConfigureOpenSourceDefaults", though. Employers of the world will lose their collective s**, if i start "enabling open source" just to link PDB with private repositories.

I'm sure we can come up with a name. I think the key point here is that there are different security needs for closed source vs open source projects. The SDK can have different default guidance and behavior for each based on a single configuration property.

Symbol server significantly slows down VS debug startup as it probes for files

So are we talking just the fact that the PDB-File isn't inside the assembly file instead in a seperate file in the same directory is enough for it to significantly slow down debug startup?
Or the process of retrieving PDB files from symbol server feeds?

I feel like this is vital to the discussion and the original post wasn't clear enough about this for me to understand.

My hot takes:

Source Link on by default: No. We are looking at this from the perspective of developers who build open source code. Meanwhile the majority of .NET apps are close source.

PDBs embedded in DLL: I like embedding the PDB in the NuGet package alongside the DLL more. It gives the developer the option to not deploy PDBs when publishing then bin directory.

I think the defaults today are fine. There are improvements I'd like to see:

  • Source Link being easier. Off by default but easy to enable.
  • Embedding PDBs in the NuGet package being easier to enable.

@JamesNK what would be the benefits of having sourcelink in the SDK, if it isn't on by default?

  • You'd need knowledge of sourcelink existing to enable it
  • When you already know how to enable sourcelink, you're most likely also able to do

    • <Sdk Name="Sourcelink" Version="2.x" /> or

    • <PackageReference Include="SourceLink"/>

Why put stuff inside Microsoft.Net.Sdk, if this forces you to put Sourcelink under .NET Sdk release cycle, when you gain nothing from it usability-wise?

EDIT: Even worse: You put a formerly nuget restored msbuild sdk, that can referenced at will into an Sdk, which has to be installed manually... Scenario bound to happen: "I want to use sourcelink with fancy new hosted git provider, but it's bundled into the Microsoft.Net.Sdk now, so i now have to wait until Appveyor has .NET Sdk 3.3.700 images..."

I mean that configuration to enable Source Link easily with a property would be in the SDK.

Disclaimer: I'm not an SDK expert.

So are we talking just the fact that the PDB-File isn't inside the assembly file instead in a seperate file in the same directory is enough for it to significantly slow down debug startup?
Or the process of retrieving PDB files from symbol server feeds?

The process of retrieving PDB files from symbol server is very slow today. I'm not aware of any noticeable debugger related perf issues between embedded pdbs and pdbs next to the file.

@JamesNK Source Link on by default: No. We are looking at this from the perspective of developers who build open source code. Meanwhile the majority of .NET apps are close source.

I guess I'm aiming to introduce a notion of "actionable best practices" that are easy to enable. Different audiences will have different profiles, but I could imagine that there's one set of settings that are oriented around OSS and another that's for internal.

Well there's two main ways of distributing Sdk packages currently:

  • Pre-Installed into your sdk folder with the dotnet-sdk install (or some other install i don't know of), distributed with slow dotnet-sdk releases
  • NugetSdkResolver, which invokes NuGet at build time to restore the package from your feeds

For sourcelink to work you need the sourcelink msbuild task binaries, which do the job of resolving current head, remote urls and the matching raw source file endpoints.
Each vcs provider has its own implementation of resolving origin urls and sourceroots into raw download urls.
If you were to bundle this into Microsoft.Net.Sdk (which is an Sdk that will not get restored by the NugetSdkResolver) this means that any change or update to the package has to be distributed by an Sdk that has months between major releases. There's technically nothing wrong with doing so, it's just less flexible.

(Correct me if i'm wrong about stuff, Sdk overloads šŸ˜„ )

@MeikTranel depending on the mechanism used, it could be possible to override the SDK's version of the tools with an updated version. Also, the SDK does seem to ship updates on a regular cadence, so it'd get picked up that way too.

@onovotny yeah global.json is your friend here, but pinning versions for Microsoft.Net.Sdk doesn't magically install it. for anyone that doesn't use hosted build agents this means that any requirement to the sdk has to revolve around asking Ops teams to update build images everytime a useful tweak to the sdk comes along.
NuGet sdk resolution is 100x more flexible than this.

It would be ironic if just when we can provide 2rd party nuget source server support we decide to turn it off by default. However changing the default so that you use snupkg (rather than symbols.nupkg (you seem to need to set the variable today).

    <SymbolPackageFormat>snupkg</SymbolPackageFormat>

We should at least document the experience as it is today, as well as how we want the default to be, so that they can be compared properly before making a decision.

Finally I note that the issue with delays in visual Studio (or any other debugger), being slow looking up symbols from symbol servers is really a bug in the debugger. Visual studio definally caches the result and is fast (does not hit the network, but can be slow if there are DLL that it COULDN'T find. It really should cache this negative result return quickly in that case as well. Anyway, we should not let bugs in other components drive the design/defaults of roughly unrelated things. We shoudl instead log an issue to get things fixed properly.

cc @rrelyea

Embedded pdb doesn't show full callstack info in net framework < 4.7.2, it's a only a problem. For netcore I think we can use embedded as default. It's really nice proposal.

IMO new templates on v3 should include the msbuild property to enable all the ins-and-outs of this feature. v2 projects could add the property to opt-in to the behavior without an upgrade.

Since every nuget package can use its own setup of debugging configurations (so we'll have embedded pdbs from package a, portable ones from b and none from c) and the primary concern i see here is memory utilization and deployment size, maybe we should add a strip task that may run as part of the Release configuration to remove debug symbols. That way we're free to deploy the symbols however we like without affecting deployment size.

Good discussion. Quite a few points and questions raised here.

First, I’d like to point out that many of these topics are discussed in this document.
Some information there is a bit out of date, I’ll update it soon to reflect recent developments.

Let me try to group and answer the questions:

  1. Changing the defaults for OSS projects in the .NET Core SDK
    > I want to propose that the default DebugType in the SDK change from portable to embedded.

We can’t change the default. As @nguerrera mentioned this would be a breaking change. I don't think we should change the default for v3 projects either. That said …

I think the idea of an opt-in ConfigureOpenSourceDefaults property gives the SDK a chance to provide actionable best practices that people don't have to think too hard about. That doesn't change the default, but makes it really easy for developers to follow the guidance for the scenario.

How about creating a package, something like ā€œProjectBuildPolicies.OpenSource.nupkgā€ that does what you’re asking for – it sets various properties to values that we deem to be good for OSS projects.
No need for setting any property – just referencing the package from a project would opt into OSS settings.

  1. Differences between PDBs embedded to DLLs, Portable PDBs included in NuGet packages, Portable PDBs on symbol server, reasoning and guidelines

The difference is in size of DLLs and packages. For some customers and in some scenarios the package size matters, for others it does not. Sometimes the DLL size matters, sometimes it does not.

Why might package size matter? Right now package restore for dotnet/roslyn repository downloads 357 packages of total size 520MB and these take 2.6 GB in NuGet cache. Increasing that means that every restore on a clean machine is slower and needs to move more data around. Why make the restore slower and take more space on disk by downloading symbols for all packages when they will not likely be used? Why should a developer wait longer for restore when they might not even need to debug thru 3rd party code?

Similarly to DLL size. For some trivial utilities the extra size of the .exe/dll doesn’t matter. On the other hand, consider Visual Studio. It may load hundreds of assemblies to devenv.exe process. Since this is a 32-bit process the address space is tight. We can’t afford increasing the size of loaded DLLs unnecessarily. Therefore Embedded PDBs are not an option for anything that ships in VS.

We designed the NuGet symbol acquisition to be pay-for-play. Many customers don’t ever want to step to 3rd party library. There is even a VS debugger option ā€œJust My Codeā€ specifically designed for these customers. The symbols should be available only when they are needed. There are some gaps and inefficiencies in the symbol acquisition tooling (see [5] below). These issues can and should be fixed in such a way that does not regress other scenarios.

As for the guidelines - I don’t think there is a silver bullet approach that works for all cases. It really depends on what the DLLs and PDBs are used for. I’d suggest:

to use Embedded PDBs (DebugType = embedded)

  • for simple tools, utilities, build tasks, etc. that are internal to your organization or not expected to be used/referenced often (as in hundreds of thousands)
  • when building in CI builds – e.g. Roslyn binaries are built with embedded PDBs for PR validation. When tests crash during CI the crash dump has all symbols in it. We don’t need to publish these symbols anywhere.
  • for first party binaries deployed into production – e.g. Bing team builds assemblies that implement Bing services with Embedded PBDs and embedded sources. In their case the significant increase in size is less important than the benefit of having the DLLs completely self-contained, since they have many teams contributing many (thousands) dlls from many repositories. Managing source/symbol servers would be too complex.

to use Portable PDBs packaged in .snupkg

  • for all libraries published to NuGet.org

stop adding Portable PDBs to .nuget packages

  • this was temporary workaround that was needed before NuGet.org supported symbol server
  • if there are scenarios that are not working without PDBs in the package please file issues and let’s get them fixed
  1. Enable Source Link by default, or easier to enable

I would also propose that Sourcelink be included/used by default.

For sourcelink, you can add all of them and it'll use the right one. In fact it's the recommended approach for dealing with multiple hosts. Sourcelink could be bundled in the SDK like NuGet, and the others.

Unfortunately, this is not possible. In general the user needs to specify what source control provider is hosting the repository. It does so by selecting the right Source Link package (Microsoft.SourceLink.GitHub, Microsoft.SourceLink.GitLab, etc.)
For some providers, like GitHub.com or Azure DevOps, it is possible to infer the provider from the remote origin URL since these have well-known domains.
However, for on-prem hosted providers (e.g. GitHub Enterprise, GitLab, TFS, etc.) there is no way Source Link can determine the provider from the domain. In these cases a Source Link package referenced by the project checks whether it is the only Source Link package referenced by the project. If so it assumes that the domain belongs to the source control provider it supports. If multiple Source Link packages are referenced then the project needs to specify domains that should be handled by each of them.
Besides that, some providers always require a bit of configuration – e.g. projects in TFS need to specify server virtual directory as it is impossible to infer from the URL.

There are improvements I'd like to see: Source Link being easier. Off by default but easy to enable.

What specific improvements do you want to see? Source Link packages are explicitly designed to be as simple to use as possible.

For the most common scenario enabling Source Link is done by referencing the package corresponding to the source control provider that hosts the repository, with no other configuration needed:
https://github.com/dotnet/sourcelink#using-sourcelink.

  1. Including/excluding symbols when deploying apps

That's a deployment concern and a deployment preference. Why is someone's small footprint deployment priority governing by debugging and error logging experience?

maybe we should add a strip task that may run as part of the Release configuration to remove debug symbols

Some things like Appx/MSIX packaging strip-out pdb files by default. They rely on extra properties (AppxPackageIncludePrivateSymbols) to include them,

Sounds like some tools already strip them. I agree that the developer should have the option to include/exclude all symbols when deploying their app. Maybe what is missing is a unified approach (setting) that works across all tools that produce deployment artifacts (Appx, MSIX, .NET Native, IL Linker, etc.)

Some .NET tools, like .NET Native do not support symbol server; the symbols need to be alongside the dll

Seems like something that can and should be fixed. I’d recommend filing a separate issue.

  1. Symbol acquisition

Symbol server significantly slows down VS debug startup as it probes for files

Definitely agree with @vancem that this a bug that needs to be fixed and there is no reason to change the design of PDB delivery via NuGet.

Symbol servers may be blocked by corporate firewalls, or otherwise affected by low-bandwidth connections (think airplanes, trains, mobile hotspots, and anywhere else with slower/spotty connections).

Is the firewall blocking symbol servers an issue for symbols.nuget.org? If a firewall is blocking nuget.org then you won’t be able to get packages in the first place.

I agree that symbol acquisition can be improved to accommodate the offline scenario. Today you can run nuget restore to fetch all packages from the server and then work offline. I think we need a similar tool to acquire all symbols you’ll need to debug your project and its dependencies. An msbuild task or a global tool that determines the dependencies and downloads their symbols for specified platform/runtime. Similarly it can take a memory dump file and fetch symbols for all binaries in the dump. Similarly to nuget restore, which uses nuget cache, the tool would populate the local symbol cache debuggers uses to avoid downloading symbols multiple times. This would thus not require any changes in debuggers. They would just consult the cache as they do already. Although, perhaps the debuggers (or even VS and nuget?) might need an easy to use offline mode switch to avoid attempting to download symbols that are not in the cache.

  1. Support for Portable PDBs in .NET Framework runtime stack traces

.NET Full Framework wouldn't read the symbols and render the data in stack traces. That's something that needs fixing if not already done in the .NET 4.x line.

Portable PDBs (and embedded PDBs) work on 4.7.2. They are not enabled in stack traces by default due to back-compat concerns. A configuration switch is needed.

Note previous similar discussion in https://github.com/aspnet/Universe/issues/131.

As a followup to aspnet/EntityFrameworkCore#15316:

It sounds like what we then would need would be a way to publish an application while including symbols for all packaged assemblies. Like some dotnet publish --include-symbols that would – if necessary – load the symbols from the symbol server so that they can be distributed with the application.

/cc @SimonCropp

/cc @nguerrera @livarcocc @rrelyea @anangaur

@tmat it is not clear to me at this point what exactly this bug is tracking... Can you help clarify that?

I think we should create a separate issue for https://github.com/dotnet/sdk/issues/2679#issuecomment-485137716 and close this one.

given netcore3 has sailed. should this be renamed? or closed?

@nguerrera can u point tot the doco that explains how embedded symbols impact impact virtual memory usage? my understanding was the runtime was smart enough to ignore that part of the assembly until it was needed?

https://github.com/dotnet/sdk/issues/2679#issuecomment-440030138

Was this page helpful?
0 / 5 - 0 ratings