Describe the problem you are trying to solve
As per #1858 and other places, it would be good to support zstd compression for component packages in channels. This would allow for smaller, faster-to-decompress components which is a benefit all round.
Describe the solution you'd like
Support for .zst as a file extension and zstd decompression as part of Rustup.
Notes
This ought mostly to be constrained to the src/dist/ tree, particularly the manifestation.rs and component/package.rs files. Some test support will also be needed.
Since zstd generally compresses better and decompresses more quickly than xz, we should prefer .zst over .xz where it is present in the manifest.
Once implemented, the compiler team could start to produce channels with zstd compressed content
For reference, on my laptop:
$ du -hs rust-1.46.0-x86_64-unknown-linux-gnu.tar*
842M rust-1.46.0-x86_64-unknown-linux-gnu.tar
211M rust-1.46.0-x86_64-unknown-linux-gnu.tar.gz
122M rust-1.46.0-x86_64-unknown-linux-gnu.tar.xz
99M rust-1.46.0-x86_64-unknown-linux-gnu.tar.zst
$ time gzip -dc rust-1.46.0-x86_64-unknown-linux-gnu.tar.gz >/dev/null
real 0m4.890s
user 0m4.853s
sys 0m0.036s
$ time xz -dc rust-1.46.0-x86_64-unknown-linux-gnu.tar.xz >/dev/null
real 0m9.218s
user 0m9.160s
sys 0m0.056s
$ time zstd -qdc rust-1.46.0-x86_64-unknown-linux-gnu.tar.zst >/dev/null
real 0m0.839s
user 0m0.811s
sys 0m0.028s
@joshtriplett Could you also run compression timing? If zstd is significantly slower we probably can't afford it.
@Mark-Simulacrum Depends heavily on compression level and what tradeoff we want to make. How much time does the current xz compression use?
I don't think we have timings, so it's hard to say. Does decompression time not correlate with compression time much then? (Beyond "disk reading is slower with more data")?
I'd guess a good way to try and estimate things would be for someone to read https://github.com/rust-lang/rust-installer/blob/d66f476b4d5e7fdf1ec215c9ac16c923dc292324/src/tarballer.rs#L49-L56, lower that into either xz command line arguments or write up a small Rust program that we can test things out with. Then we'd download all the artifacts for a single commit from S3 (I can help there, though the rustup manifest is an easy place to start), and write up a small-ish table of compression/decompression times for various settings on each of the tools. Then we can try and select our best trade-off; CI time these days is mostly plentiful on dist builders, and if we do this during release promotion we can definitely take somewhat longer.
Does decompression time not correlate with compression time much then? (Beyond "disk reading is slower with more data")?
No, it doesn't. Compression can take anywhere from "faster than gzip" to "several minutes", with corresponding improvements in data size; decompression of all of those is proportional to data size, never compression time. Taking several minutes would be worth it for stable release tarballs, while nightly/CI versions could scale that back a little and aim for taking the same amount of time in CI that we currently do.
@joshtriplett @Mark-Simulacrum Was there consensus on this in the end? Is this still on the cards?
I think we should gather some more data timing and size wise, but I expect that we should indeed support zstd compression (and perhaps even make that our canonical format, instead of xz, in the next several cycles after it rolls out).
That said, I would like for us to try to figure out a plan for limiting ourselves to maybe 2-3 long-term supported compression formats for cost / time reasons; probably gzip needs to stick around for compatibility (though I don't think I've encountered xz lacking but gzip supporting servers myself) but I'm not sure about xz vs zstd.
cc @rust-lang/release
It's not that much pain for rustup to support all the formats, though we don't have to generate them all for each channel release.
With that said, I'll consider the work to add zstd support to rustup in the near future.
Yeah, I'm mostly worried about storing 2-3x as much data (plus the time to recompress things) if we're going to accumulate new algorithms over time, not implementation complexity of each piece.
I think that if zstd looks promising, the right thing will be to swap xz for it. gzip is a good idea to retain long-term for compatibility I guess.
On Sat, Oct 17, 2020 at 02:35:25PM -0700, Daniel Silverstone wrote:
I think that if zstd looks promising, the right thing will be to swap xz for it. gzip is a good idea to retain long-term for compatibility I guess.
Honestly, zstd isn't hard to come by. I don't think it'd be unreasonable
to drop gzip as long as rustup supports zst.
From a rustup perspective we have to support everything ever published to a
release channel, so swapping isn't really what we'd do ๐
On Sun, 18 Oct 2020, 02:30 Josh Triplett, notifications@github.com wrote:
On Sat, Oct 17, 2020 at 02:35:25PM -0700, Daniel Silverstone wrote:
I think that if zstd looks promising, the right thing will be to swap xz
for it. gzip is a good idea to retain long-term for compatibility I guess.Honestly, zstd isn't hard to come by. I don't think it'd be unreasonable
to drop gzip as long as rustup supports zst.โ
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/rust-lang/rustup/issues/2488#issuecomment-711098348,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AADZ7XSDBD2YQBTAGAPQSO3SLIZJ5ANCNFSM4RMYN74A
.
I'm not suggesting that rustup can drop the support, since they need to
be able to install old releases. I'm suggesting that the release
channels could, eventually, drop other formats.
On Sat, Oct 17, 2020 at 11:25:38PM -0700, Robert Collins wrote:
From a rustup perspective we have to support everything ever published to a
release channel, so swapping isn't really what we'd do ๐On Sun, 18 Oct 2020, 02:30 Josh Triplett, notifications@github.com wrote:
On Sat, Oct 17, 2020 at 02:35:25PM -0700, Daniel Silverstone wrote:
I think that if zstd looks promising, the right thing will be to swap xz
for it. gzip is a good idea to retain long-term for compatibility I guess.Honestly, zstd isn't hard to come by. I don't think it'd be unreasonable
to drop gzip as long as rustup supports zst.โ
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/rust-lang/rustup/issues/2488#issuecomment-711098348,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AADZ7XSDBD2YQBTAGAPQSO3SLIZJ5ANCNFSM4RMYN74A
.--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/rust-lang/rustup/issues/2488#issuecomment-711123692
The nice thing is that once the client has zst support, the server-side can decide how aggressively to compress for each tarball. For instance, if stable artifacts get more downloads, we could spend more resources compressing them.
Currently the channel manifest toml only has gz and bz2 support, and not in an entirely extensible way, I think that we need to define how the channel toml will represent the available compression formats, and thence what is considered 'acceptable' in terms of available formats. E.g. would a channel manifest with only .zst be okay? For that to be the case a lot of tooling may need adjustment.
I've given this some thought already and will continue to do so, and may bring it up for wider discussion at the next dev-tools sync.
True, not bz2, braino on my part :D
@joshtriplett If you use a master build of rustup then there's now .zst support there though I'd consider it entirely rough-cut for now. If you'd prefer to wait until it's in a released version before experimentation that's fine, but now you should be able to have a play around with a theoretically .zst-only toolchain channel manifest if you wanted. Just use zst_url and zst_hash as you might expect.
I won't be advertising this beyond a changelog entry along the lines of Experimental support for ZStd compressed dist artifacts for now.
@89z I didn't say that any of the channels had been updated, merely that the in-development branch of rustup could now support it if Josh wanted to experiment. Don't expect this any time soon.
@kinnison Thank you!
I'm working on patches to try this in the Rust distribution process. Right now, those patches are waiting on https://github.com/gyscos/zstd-rs/pull/117 to go into a released version of the zstd crate, because Rust tarballs benefit greatly from zstd's long-distance matching mode.
I'm working on patches to try this in the Rust distribution process.
Great to hear - if you need any extra work on our side to enable your experiments, please let me know. I fully expect that we'll not have done everything right first time :D
I worry about the reference to memory footprint: we have literally just
resolved issues extracting large files from tarballs on platforms like
raspberry pi; requiring hundreds of MB of RAM in working set to decompress
will not work for a significant number of use cases.
@rbtcollins I'd expect zst to typically take less memory on decompression, and much less decompression time (by a factor of 10). And gzip isn't going away.
Just a note, I'm not sure whether it will be feasible to provide three compression formats. Our releases are already huge and we store them forever, so adding more duplication could be an issue. The infra team would need to discuss what we're going to serve to users.
Rustup would need to figure out logic for when to choose which once as
well..., probably solvable, but not zero cost.
>
@pietroalbini I'm not expecting that we should supply three. I'm hoping that we move from gz/xz to gz/zst. I'm working on a proposal for that.
Most helpful comment
@joshtriplett If you use a master build of
rustupthen there's now.zstsupport there though I'd consider it entirely rough-cut for now. If you'd prefer to wait until it's in a released version before experimentation that's fine, but now you should be able to have a play around with a theoretically.zst-only toolchain channel manifest if you wanted. Just usezst_urlandzst_hashas you might expect.I won't be advertising this beyond a changelog entry along the lines of
Experimental support for ZStd compressed dist artifactsfor now.