Rust: Rust platform size

Created on 20 Jun 2019  路  18Comments  路  Source: rust-lang/rust

I went to this page:

https://forge.rust-lang.org/other-installation-methods

and I discovered that the download is 203 MB. This was surprising to me, so I
looked at other languages:

Language | Size | Link
---------|--------|----------------------
Go | 135 MB | https://golang.org/dl
Perl | 102 MB | http://strawberryperl.com/releases.html
Julia | 50 MB | https://julialang.org/downloads
Python | 25 MB | https://python.org/downloads/release/python-373
PHP | 25 MB | https://windows.php.net/download
D | 23 MB | https://dlang.org/download.html
Nim | 19 MB | https://nim-lang.org/install_windows.html
Ruby | 11 MB | https://rubyinstaller.org/downloads

So Rust is 70% larger than Go. Or to put another way, Rust is larger than Julia,
Python, PHP, D, Nim and Ruby combined. Can anything be done about this or is
the large size unavoidable?

C-enhancement I-heavy T-compiler

Most helpful comment

IIUC https://github.com/rust-lang/rust/pull/59800 will reduce the size, potentially by a lot

All 18 comments

IIUC https://github.com/rust-lang/rust/pull/59800 will reduce the size, potentially by a lot

Also the comparison might want to include Haskell, whose installer is 268 MB (https://www.haskell.org/platform/windows.html).

These comparisons are probably missing various aspects, e.g. Haskell includes the full haskell platform with a list of packages as well as a runtime. Rust might include other things... We should compare with our past selves first and foremost.

@cup where are you getting that 205 MB number from? When I try to download https://static.rust-lang.org/dist/rust-1.35.0-x86_64-unknown-linux-gnu.tar.gz it tells me 252 MB so the problem is even worse than you say it is... Maybe linux vs windows? Some points for where the potential bloat could come from:

  • licenses are stored multiple times for each package. Costs a few KB but I guess it can't be avoided. It would probably be useful to have a compression format that can do content based deduplication, but I guess there is none out there, tar.gz/zip is what we need to use.

  • save analysis of all std crates is being stored. Using json, the save analysis format is extremely verbose. It contains repetitive stuff like references to the same source file over and over. It could probably be improved a lot by a) interning stuff like source file names and b) a binary format. As IIRC serde is used already for json output and input, adopting bincode would be a first easy step then followed by interning (interning can be done transparently from bincode). The save-analysis format is unstable and probably always will be, given that replacement is on its way with rust-analyzer. If there is ever a desire to stabilize, one can switch back to json. As for the point that we are using deflate compression, sure it helps but a native compression would help even more.

  • rust's copy of LLVM is 76 MiB while julia's copy of LLVM is 48 MB (both uncompressed). Why is that the case? Rust has LLVM 8.0 and julia 6.0 but that can't be the reason, has LLVM grown this much in size between two versions?

  • there is a documentation directory that contains rustdoc/mdbook of various components. Each component has its very own copy of fonts (each almost half of a MB large). It's repeated multiple times. I'm sure much can be saved by using a shared copy of those fonts.

rust's copy of LLVM is 76 MiB while julia's copy of LLVM is 48 MB (both uncompressed). Why is that the case?

At least for Linux a lot of dependencies are linked statically (even libstdc++) to make it work on old Linux releases.

@cup no need to wait for nightlies, the built artifacts are already being uploaded to public servers. This gives us the following size impact of #59800 :

| file name | dbeed58adee2ef046b46b252980f86672f9bfc4c (prior to #59800 merge) | dd2e8040a35883574ae0c4cc7a4e887ecb66469c (after #59800 merge) |
| - | - | - |
| rustc-nightly-x86_64-unknown-linux-gnu.tar.gz | 122 073 207 | 100 939 501 |
| rust-std-nightly-x86_64-unknown-linux-gnu.tar.gz | 82 951 040 | 211 427 679 |
| cargo-nightly-x86_64-unknown-linux-gnu.tar.gz | 6 751 312 | 6 752 518 |
| rustc-nightly-x86_64-pc-windows-msvc.tar.gz | 82 391 063 | 59 805 292 |
| rust-std-nightly-x86_64-pc-windows-msvc.tar.gz | 74 663 690 | 206 604 497 |
| cargo-nightly-x86_64-pc-windows-msvc.tar.gz | 4 450 362 | 4 443 326 |

Numbers collected via commands like export COMMIT=dd2e8040a35883574ae0c4cc7a4e887ecb66469c; export TOOL=rustc-nightly-x86_64-unknown-linux-gnu.tar.gz; curl -I https://s3-us-west-1.amazonaws.com/rust-lang-ci2/rustc-builds/${COMMIT}/${TOOL} | rg Content-Length.

So for rustc there is a nice sweet reduction but std has a massive size increase. @Zoxc why is this the case? Maybe some cache not being emptied?

@est31 rust-std also includes the compiler crates, and with https://github.com/rust-lang/rust/pull/59800 there's duplication for these as the same code is included in the rlibs and rustc_driver's dylib.

Hi, I have a question. What needs those rlib files in rust-std? Can I remove those rlib files from rust-std and have most crates build as normal? (I'm repackaging rust-nightly for Arch Linux.)

It's 109MiB larger now and it'll take me four more minutes to download (what's worse, I can't watch online videos meanwhile)....

There are some generic bloat problems that Rust has like https://github.com/rust-lang/rust/issues/46477 which increase binary size which probably play into Rust's large platform size, too.

Before #59800 a lot of symbols were placed into shared libs and duplication wasn't an issue.
Many of those shared libs were replaced by static ones (.rlib) which has benefits (better performance, simplification of Rust internals) and downsides (duplicated symbols, bigger size).

Using .tar.xz archives instead of .tar.gz reduces download size of linux-gnu std by 30 MiB at the expense of extraction time. It can help with slow connections.

Hi, I have a question. What needs those rlib files in rust-std?

Rust itself and the crates.

Can I remove those rlib files from rust-std and have most crates build as normal?

Some of the libs aren't necessary for every use case but they are relatively small. Big libs which are the issue here cannot be removed.


It doesn't mean nothing can be improved.

Multiple dependencies are built multiple times, it's clearly visible with libc. In linux-gnu std from commit dd2e8040a35883574ae0c4cc7a4e887ecb66469c there are 2 liblibc-*.rlib with size 2 MiB each.
Most of the duplicates are much smaller but also harder to find because they don't have their own libs and can be found only inside other libs.

EDIT: cc #57076

@mati865 thank you for the detailed response. However this part is concerning:

Using .tar.xz archives instead of .tar.gz reduces download size of linux-gnu
std by 30 MiB at the expense of extraction time. It can help with slow
connections.

that seems to be "cutting at the branches rather than cutting at the root". To
compare Go 1.12.6 is 118 MB:

https://dl.google.com/go/go1.12.6.windows-amd64.msi

Then Rust is 294 MB:

https://static.rust-lang.org/dist/rust-nightly-x86_64-pc-windows-gnu.msi

thats nearly 2.5 times larger. I know every language is different but this
seems too much.

that seems to be "cutting at the branches rather than cutting at the root"

It's not the solution by any means, I just wanted to make people aware of better compressed archives.

I know every language is different but this
seems too much.

Yes, it's concerning for sake of the completeness there is another big compiler.
Official LLVM .tar.xz Linux archive is 325 MiB and for Rust commit dd2e8040a35883574ae0c4cc7a4e887ecb66469c it's 262 MiB.
It also means Rust is now bigged than GHC.

That said nightly to nightly it's over 100 MiB download size regression and there should be an effort on reducing it but there is no immediate solution.

Can I remove those rlib files from rust-std and have most crates build as normal?

Some of the libs aren't necessary for every use case but they are relatively small. Big libs which are the issue here cannot be removed.

The two biggest are librustc.rlib (85M) and librustc_mir.rlib (42M) -- are you sure those cannot be removed? I suppose nightly/unstable users may need those, but for stable ISTM we only need the artifacts from x.py build libstd and libtest (which includes the proc-macro shim).

We could potentially replace the .rlibs with just .rmetas, which would still allow to use those crates, we'd just need a way to force linking against librustc_driver.so (which contains the actual machine code from those crates).

I've previously suggested this in https://github.com/rust-lang/rust/pull/59800#issuecomment-503990460.

I'm sorry for being dense, but what is the reason for having two copies of the compiler? That is, each distribution contains the duplicates lib/librustc_driver-7f45b60a9f549617.dylib and lib/rustlib/x86_64-apple-darwin/lib/librustc_driver-7f45b60a9f549617.dylib (among many others).

lib/* are host libraries installed by the rustc component, and lib/rustlib/$target/lib/* are target libraries installed by the rust-std component. The latter may be used for cross compilation too.

But yes, that duplication is unfortunate.

Can you explain more what purpose they serve for cross compilation? Targets which do not have host builds do not include a second copy of the compiler. Is it to support projects that link the compiler directly with extern crate rustc? I'm not sure I see how the target's librustc_driver.so is relevant for cross compiling.

Is it to support projects that link the compiler directly with extern crate rustc?

Yes, you could cross compile something using rustc libs -- unstable, of course. But if #64823 goes through, we won't include this in rust-std anymore, probably just another optional component.

Was this page helpful?
0 / 5 - 0 ratings