Home: Nupkg compression is bad

Created on 5 Jul 2015  Â·  16Comments  Â·  Source: NuGet/Home

Currently nupkg compression doesn't seem to de-dup multiple files. It also does a poor job at compressing XML. We doubled the size of our packages when we added localized docs. I haven't done a ton of investigation here but I bet there are a bunch of things nuget could do better.

DCR

Most helpful comment

@zhili1208 @rrelyea Definitive close or this might be reevaluted later?
Comparing 7z to zip, I am sure having packages twice smaller (and even more with duplicate files w/ ref assemblies) would benefit a lot of people due to faster download/deployment.

All 16 comments

What tools are you using to pack?

nuget.exe pack on a nuspec

NuGet is currently using the packaging APIs rather than ZipArchive. There is a possible potential for improvements, but this doesn't seem to be high on the immediate to do list

I tested ZipArchive when I created this bug and didn't find much benefit. I also tried some other zip utilities with different compression quality and didn't see much benefit.

LZMA on the binaries showed some major wins ~2/3 the size of zip, much greater when files were very similar as would be the case for cross-compiled implementations. Also potentially using other compression tech for XML could provide better wins there. I didn't get any chance to try it but there is a new standard for XML compression http://www.w3.org/XML/EXI/.

Suppose the XML is represented as a EXI compressed format within the container and then the entire container has LZMA. I think that'd be a significant savings.

EXI might even be something to look at for the docs on disk, assuming we could get VS support.
/cc @davidfowl

The down side of fancier compressions means breaking compat with older clients.

Sent from my Windows Phone


From: Eric StJohnmailto:[email protected]
Sent: ‎10/‎1/‎2015 4:04 PM
To: NuGet/Homemailto:[email protected]
Cc: Yishai Galatzermailto:[email protected]
Subject: Re: [Home] Nupkg compression is bad (#890)

I tested ZipArchive when I created this bug and didn't find much benefit. I also tried some other zip utilities with different compression quality and didn't see much benefit.

LZMA on the binaries showed some major wins ~2/3 the size of zip, much greater when files were very similar as would be the case for cross-compiled implementations. Also potentially using other compression tech for XML could provide better wins there. I didn't get any chance to try it but there is a new standard for XML compression http://www.w3.org/XML/EXI/https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fwww.w3.org%2fXML%2fEXI%2f&data=01%7c01%7cyigalatz%40microsoft.com%7c0f908850193c4ae3123408d2cab4b65e%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=s044VWmgMjAebqKQhe%2bvqliPsvvZdEVB5k2uCNjwTRM%3d.

Suppose the XML is represented as a EXI compressed format within the container and then the entire container has LZMA. I think that'd be a significant savings.

EXI might even be something to look at for the docs on disk, assuming we could get VS support.
/cc @davidfowlhttps://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fgithub.com%2fdavidfowl&data=01%7c01%7cyigalatz%40microsoft.com%7c0f908850193c4ae3123408d2cab4b65e%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=zBISnWIADWZwK74rXClgQE55lC5%2fo888bMUn%2bsCDTCQ%3d

—
Reply to this email directly or view it on GitHubhttps://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fgithub.com%2fNuGet%2fHome%2fissues%2f890%23issuecomment-144872675&data=01%7c01%7cyigalatz%40microsoft.com%7c0f908850193c4ae3123408d2cab4b65e%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=1T9PHGz94jRrEcj3DQIL1eCSwcCbJ%2bD9%2f2dBpgk5n%2fk%3d.

You could do it on the server. If the client tells you it supports the new format, give it to them. Otherwise give them the old format.

Not everyone is using Nuget.org as a server. I think this needs to be thought through further to have a migration strategy with compatibility for older clients considered.

On Oct 1, 2015, at 20:04, Eric StJohn [email protected] wrote:

You could do it on the server. If the client tells you it supports the new format, give it to them. Otherwise give them the old format.

—
Reply to this email directly or view it on GitHub.

@csharpfritz of course. I'm not suggesting anything breaking here. As I mentioned it can be something opt-in by the client, and optional from the server. Sort of how Accept-Encoding and Content-Endcoding work with HTTP: client tells server it can understand the new format. Server tells client what format its giving it.

+1 for LZMA. I've switched from NSIS, which has LZMA compression, to Squirrel.Windows (uses NuGet packages) and my installer has doubled in size.

Ideally there should be an option for choosing between a handful of compression methods. Faster decompression may also be favored over a smaller package size depending on the use case.

To be clear here, you can significantly improve compression while still using ZIP and remaining compatible with all legacy Zip (DEFLATE) clients. You can use e.g. Zopfli on the DEFLATE streams or even just use 7Zip to generate the ZIP, set to maximum compression.

you can significantly improve compression while still using ZIP

I guess that depends on your definition of significant. I only saw gains under 5% by tweaking DEFLATE. A couple problems are that the window size is too for deflate small and zip isn't cross-file. The only ways I saw significant gains were using cross-file compression with a significantly large compression window.

Bump: for Xenko, we reduced package from 260mb to 140mb by using 7z inside the package (automatically decompress on install). We reused Microsoft.DotNet.Archive to deduplicate files too.

However, since there is no init.ps1 anymore in new NuGet, we can't rely on that anymore...

@zhili1208 @rrelyea Definitive close or this might be reevaluted later?
Comparing 7z to zip, I am sure having packages twice smaller (and even more with duplicate files w/ ref assemblies) would benefit a lot of people due to faster download/deployment.

Yeah, with the current Nuget.org limit I can't even serve the latest TensorFlow, because its largest binary after zip compression is still 260MB alone. Either increase the limit on Nuget.org, or let us use 7z.

@rrelyea I am facing a combination of poor compression/NuGet.org package size limit issue with TensorFlow binaries (see the issue linked above).

Switching to ZIP+LZMA from ZIP+Deflate reduces the size of packed binaries from ~400MB to ~100MB. I am sure it would save NuGet.org a lot of traffic if adopted for larger packages.

@rrelyea what exactly is the problem here with LZMA? It is not a breaking change to allow something, that was not allowed previously. The older clients won't be able to download new packages compressed with LZMA, but those are new packages. Previously existing packages will still work with older clients.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

skofman1 picture skofman1  Â·  3Comments

blackchoey picture blackchoey  Â·  3Comments

augustoproiete picture augustoproiete  Â·  3Comments

philippe-lavoie picture philippe-lavoie  Â·  3Comments

rrelyea picture rrelyea  Â·  3Comments