Onnxruntime: Nuget PDB files should not be included in the nuget

Created on 26 Dec 2019  路  9Comments  路  Source: microsoft/onnxruntime

I appreciate that debug symbols are important for developers but they Symbol packages can be published separately, saving a lot of space for the reuse of the Nuget. as mentioned here.
To see how to publish the symbol packages, there is a guide here.
https://docs.microsoft.com/en-us/nuget/create-packages/symbol-packages-snupkg

contributions-welcome wontfix enhancement

All 9 comments

Just a quick note, we prefer having these symbols part of the package. It makes a lot of things easier. The overhead is also not that big compared to the size of dependencies.

Hi and Happy New year @nietras

I think that the decompressed package now weighs in at 578MBs with 2 symbol packs and the included MacOS dylib.

As @hanabi1224 explains here the size is an issue for redistribution to users rather than developers. I make simplification packages for end user tools so the debugging portion isn't something that my end users should need to worry about. They build the tool and use it. If or when they find a bug I fix it and re-ship the package. I would like them to be able to build from a package that could be around a 5fith of the size,

I appreciate that other users have different needs. So I suggested the symbol packages even though my research suggests they may be a bit cumbersome to use.

@nietras, What overhead do you encounter?

Happy new year to you too @yanyas

I definitely understand and think disk footprint is important, but I do not think the pdb files are the main culprit here. Having pdb files available in a symbol server is an option, but often this becomes a problem, if the server is not available for eternity.

The main problem is how OnnxRuntime employs a "one size fits all" approach to nuget packaging. Made worse by there being "one big" nuget package for different execution providers. This means class library use of OnnxRuntime is problematic since you have to deal with this then defining the execution provider for any users of the class library.

In my humble opinion the packaging and project structure needs to be overhauled to be modular and pluggable, this would match the true scope and benefit of the runtime. Below I try to give an example of how OnnxRuntime could be packaged as a small underlying set of packages that can then be references from meta-packages if need be. Each execution provider should have it's own set of packages.

Prefix below with Microsoft.ML. to match current naming. The structuring below is for C# mainly, but same principles can be applied to C/C++ users.

OnnxRuntime.Interop (basic and general interface for OnnxRuntime from C#, does not refer to any runtimes or similar)
OnnxRuntime.runtimes.win-x64 (native dlls only, onnxruntime.dll)
OnnxRuntime.runtimes.linux-x64 
OnnxRuntime.runtimes.macOS-x64
OnnxRuntime.MKLML.Interop (this defines an extension method to allow adding the execution provider from C#)
OnnxRuntime.MKLML.runtimes.win-x64 (native dlls only, onnxruntime-mklml.dll and mklml.dll etc.)
OnnxRuntime.MKLML.runtimes.linux-x64
OnnxRuntime.MKLML.runtimes.macOS-x64

Naming is just examples :) You can replace MKLML with whatever execution provider there is e.g. CUDA, TensorRT etc. You can pick and chose which runtime you want by picking packages. You can have class libraries refer to just OnnxRuntime.Interop and in the final "composition root" e.g. application you can then add whatever execution runtimes you want for whatever platforms you need. This would both reduce disk footprint a lot for our use case, but also allow for publishing of all supported execution providers without making a huge one size fits all package and not the least avoid the "build from source" solution there is right now for any execution providers not available as nuget packages.

You can still have a single meta-package OnnxRuntime that encompasses the default components for ease of use, but this allows any real world use to be much more flexible.

If there is any interest in this I might make a more formal issue proposing the restructuring to give us the above.

@snnn

I now understand and appreciate your point about only being able to debug if the symbol server is available, which isn't always the case.I think that's a really good proposal. I'd like to hear what others think about this one, but you'd have my vote

I would like them to be able to build from a package that could be around a 5fith of the size,

Has a user explicitly complained about the size on disk? The primary disadvantage would be download time -- the loading or running time should not be impacted (only relevant dlls are loaded).

Each execution provider should have it's own set of packages.

Fragmenting a single package into multiple packages is not without drawbacks (e.g. one for native assets, another for managed, a third for PDB files). Besides the user confusion of which package to install, there is also a namespace explosion, which increases maintenance and versioning friction.

@YanYas, can you simply delete the PDB and other unrequired files after the package is installed, if disk space is an issue? This way your application can keep the disk utilization to a bare minimum for the end users.

I think the package should at least provide a way for ppl to be able to opt-out the large PDB file in CI environment instead of doing it manaually. Personally I use below custom build target in Directory.Build.targets to achive it.

<Target Name="RemoveOnnxRuntimePdb" AfterTargets="AfterBuild">
    <WriteLinesToFile
      File="$(OutDir)onnxruntime.pdb"
      Lines="dummy"
      Overwrite="true"
      Encoding="Unicode" />
  </Target>

  <Target Name="RemoveRuntimePdbFromNuget" AfterTargets="ComputeFilesToPublish">
    <ItemGroup>
      <RuntimePdbFromNugetToRemove
        Include="@(ResolvedFileToPublish)"
        Condition=" '%(ResolvedFileToPublish.PackageName)' != '' 
            and '%(ResolvedFileToPublish.AssetType)' == 'native' 
            and '%(Extension)' == '.pdb'
            " >
        <Dummy>$(MSBuildThisFileDirectory)dummy.pdb</Dummy>
      </RuntimePdbFromNugetToRemove>
      <ResolvedFileToPublish Remove="@(RuntimePdbFromNugetToRemove)" />
      <RuntimeDummyPdbToAdd Include="%(RuntimePdbFromNugetToRemove.Dummy)">
        <AssetType>%(RuntimePdbFromNugetToRemove.AssetType)</AssetType>
        <CopyToPublishDirectory>%(RuntimePdbFromNugetToRemove.CopyToPublishDirectory)</CopyToPublishDirectory>
        <DestinationSubPath>%(RuntimePdbFromNugetToRemove.DestinationSubPath)</DestinationSubPath>
        <PackageName>%(RuntimePdbFromNugetToRemove.PackageName)</PackageName>
        <PackageVersion>%(RuntimePdbFromNugetToRemove.PackageVersion)</PackageVersion>
        <RelativePath>%(RuntimePdbFromNugetToRemove.RelativePath)</RelativePath>
      </RuntimeDummyPdbToAdd>
      <ResolvedFileToPublish Include="@(RuntimeDummyPdbToAdd)" />
      <Message Text="Removed native pdb files from nuget: @(RuntimePdbFromNugetToRemove)" Importance="High" Condition=" '@(RuntimePdbFromNugetToRemove)' != '' " />
    </ItemGroup>
  </Target>

Nuget specifically has separate symbol packages for a reason. It is not good practice to include PDB files in your package, especially when they are 130MB. :-( It is also not reasonable to ask people to manually remove files from a nuget package they've downloaded. All this is to say... yes please on this issue.

@nietras Thanks for writing up the proposal. In the upcoming 1.2 release (early March), we plan on separating the managed assembly into a separate package called Microsoft.ML.OnnxRuntime.Managed. Each execution provider will be delivered in it's own separate package without the managed assembly. This aligns with your proposal partially.

At this time we do not plan to separate the pdb files and x86/linux/mac binaries into separate packages for the sake of simplicity of our first party users.

xref: https://github.com/microsoft/onnxruntime/issues/2184

I'm having the same issue. Including pdb file vastly increase the pack size, could we put pdb file into seperate folder (at least not runtime..), or put them in a separate symbol package, like what ML.Net do.

Was this page helpful?
0 / 5 - 0 ratings