Rust: Tracking issue for MIR-only RLIBs

Created on 8 Jan 2017  路  35Comments  路  Source: rust-lang/rust

There's been some talk about switching RLIBs to "MIR-only", that is, make RLIBs contain only the MIR representation of a program and not the LLVM IR and machine code as they do now. This issue will try to collect some advantages, disadvantages, and other concerns such an approach would entail:

Advantages

  • Less code duplication, which has four benefits:

    • RLIBs would be smaller because they would not contain LLVM IR and machine code anymore.

    • RLIBs and leaf crates would be smaller because, at the moment, instantiations of generic functions show up multiple times in the object code and LLVM IR.

    • RLIBs and leaf crates would be smaller because the compiler would be able instantiate monomorphic functions strictly on demand, as @japaric points out.

    • Possibly faster whole-project compiles, since generic instances are never compiled multiple times (although see "Disadvantages")

    • RLIBs would compile faster because the trans and LLVM passes would always be skipped (much like when compiling with -C metadata).

    • At the moment libstd is compiled with -Cdebuginfo=1, which is good in general but as a side-effect increases the size of Rust binaries, even if they are built without debuginfo (because the debuginfo from libstd gets statically linked into the binaries). This problem would not exist with MIR-only rlibs.

    • In the past we've had problems with WeakODR linkage and COMDAT sections on MinGW. WeakODR linkage is one way to deal with duplicate generic instances and avoiding those would also remove any reason to use WeakODR.

    • We would always get LTO-grade compiler optimizations since all code is available at codegen time.

    • Some targets, like NVPTX, don't seem to support regular linking (see #38787). Only generating object code in leaf crates would solve this problem.

    • There seems be some indication that MIR-only RLIBs would help with making the Rust compiler more backend agnostic (see WASM-related issue #38804).

    • Generating LLVM IR only in leaf crates would make it easier to add comprehensive LLVM-based instrumentation like LeakSanitizer without recompiling libstd (see #38699), as @japaric points out.

    • All Rust code (even that from libstd) can be compiled with -C target-cpu=native, potentially resulting in better code, as @japaric points out.

    • The build process of multi-crate project would gain more parallelism, since downstream crates don't need to wait for upstream crate's codegen, even though they could already compile up until the linking phase, as @est31 points out.

Disadvantages

  • The leaf crates (executables, staticlibs, dylibs, cdylibs) would take more time to compile because

    1. the machine code of monomorphic functions from upstream crates would not be "cached" anymore, and

    2. since LLVM sees more code at once, some super-linear optimizations would take dis-proportionally more time (like when one compiles with LTO now)

  • People might rely on pub #[no_mangle] items being exported from RLIBs and link against them directly. This would not be possible anymore, as @nagisa points out.

Non-Advantages

  • MIR-only libs would not be platform independent. One could think that that should be the case but because of cfg switches, MIR is not platform independent either.

Mitigation strategies for disadvantages:

  1. The problem of caching machine code would be solved in a generalized form by incremental compilation. One has to keep in mind though that incremental compilation will produce less performant code because it prevents many opportunities for inlining.
  2. We could provide an additional, more coarse-grained codegen unit partitioning scheme for incremental compilation (e.g. one CGU per crate) for better runtime performance at the cost of longer compile times.
  3. The amount of code LLVM sees at once can easily be controlled via -C codegen-units already, which provides a means of reducing super-linear optimizations.

Open Questions

  • I think we support "bundling" native libraries into RLIBs. We might still need to keep supporting this, even if we don't store machine code originating from Rust?

Please help collect more data on the viability of MIR-only RLIBs.

cc @rust-lang/core @rust-lang/compiler @rust-lang/tools @rkruppe

A-codegen A-mir C-tracking-issue T-compiler

Most helpful comment

I've put together a proof-of-concept implementation of this in https://github.com/rust-lang/rust/pull/48373. Although the implementation crashes for many crates, I was able to collect timings for a number of projects. The tables show the aggregate time spent for various tasks while compiling the whole crate graph. In many cases we do less work overall but due to worse parallelization, wall-clock time increases. I.e. everything seems to be bottlenecked on the MIR-to-LLVM translation in the leaf crates. To me this suggests that MIR-only RLIBs are blocked on the compiler internals being parallelized.

ripgrep - cargo build

| | regular | MIR-only | % |
|------------------------|----------|----------|----------|
| LLVM codegen passes | 33.90 | 32.52 | 95.9 % |
| LLVM function passes | 1.39 | 1.35 | 97.5 % |
| LLVM module passes | 2.18 | 1.95 | 89.8 % |
| MonoItem collection | 2.80 | 2.09 | 74.4 % |
| translation | 23.73 | 19.97 | 84.1 % |
| LLVM total | 37.46 | 35.83 | 95.6 % |
| BUILD total | 20.92 | 26.14 | 125.0 % |

encoding-rs - cargo test --no-run

| | regular | MIR-only | % |
|------------------------|----------|----------|----------|
| LLVM codegen passes | 13.11 | 7.28 | 55.6 % |
| LLVM function passes | 0.57 | 0.33 | 58.1 % |
| LLVM module passes | 0.90 | 0.44 | 48.7 % |
| MonoItem collection | 1.19 | 0.69 | 58.1 % |
| translation | 8.68 | 6.08 | 70.1 % |
| LLVM total | 14.59 | 8.06 | 55.2 % |
| BUILD total | 15.73 | 14.37 | 91.4 % |

webrender - cargo build

| | regular | MIR-only | % |
|------------------------|----------|----------|----------|
| LLVM codegen passes | 109.42 | 69.17 | 63.2 % |
| LLVM function passes | 4.55 | 3.10 | 68.2 % |
| LLVM module passes | 1.63 | 1.06 | 64.7 % |
| MonoItem collection | 10.70 | 5.64 | 52.7 % |
| translation | 102.95 | 58.70 | 57.0 % |
| LLVM total | 115.60 | 73.33 | 63.4 % |
| BUILD total | 72.30 | 68.64 | 94.9 % |

futures-rs - cargo test --no-run

| | regular | MIR-only | % |
|------------------------|----------|----------|----------|
| LLVM codegen passes | 41.19 | 48.67 | 118.1 % |
| LLVM function passes | 1.68 | 1.93 | 115.0 % |
| LLVM module passes | 0.21 | 0.22 | 107.3 % |
| MonoItem collection | 5.86 | 6.90 | 117.8 % |
| translation | 55.48 | 69.84 | 125.9 % |
| LLVM total | 43.08 | 50.82 | 118.0 % |
| BUILD total | 17.28 | 19.18 | 111.0 % |

tokio-webpush-simple - cargo build

| | regular | MIR-only | % |
|------------------------|----------|----------|----------|
| LLVM codegen passes | 33.98 | 22.55 | 66.4 % |
| LLVM function passes | 1.53 | 0.95 | 62.0 % |
| LLVM module passes | 0.30 | 0.20 | 66.8 % |
| MonoItem collection | 3.80 | 2.11 | 55.5 % |
| translation | 39.09 | 22.99 | 58.8 % |
| LLVM total | 35.81 | 23.70 | 66.2 % |
| BUILD total | 22.28 | 21.54 | 96.7 % |

Number of LLVM function definitions generated for whole crate graph

| | MIR-only | regular |
|----------------------|----------|-----------|
| ripgrep | 22683 | 34239 |
| encoding-rs test | 8393 | 15116 |
| webrender | 72238 | 114239 |
| futures-rs test | 57565 | 46935 |
| tokio-webpush-simple | 27346 | 44961 |

All 35 comments

This is also potentially breaking people who are linking to rlibs expecting them to at least expose the extern #[no_mangle] functions like it does currently.

I did that at least once before, though the application where I did it was already very hacky for other reasons and I do not think the project is around anymore.

Another advantage I see is that pure MIR RLIBs effectively let us "recompile"
std with different codegen options without needing std-aware Cargo or Xargo.
This is assuming that the std component that rustup installs will also be
pure MIR.

Basically, cargo rustc --release -- -C target-cpu=native would optimize std
for the host CPU "on the fly". Today, this requires using Xargo to recompile
std (i.e. RUSTFLAGS="-C target-cpu=native xargo build").

Other case where one uses Xargo to recompile the std is producing an
executable that aborts on panic!s without the overhead of landing pads (the
std component that rustup install contains landing pads because it's
compiled with -C panic=unwind). With pure MIR RLIBs, after you set panic = "abort" in your Cargo.toml, cargo build will give you an executable that
doesn't contain landing pads (everything would get compiled with -C panic=abort).

cc @brson ^ pure MIR RLIBs would eliminate the need for std-aware Cargo and
Xargo for some scenarios.

I expect the above will also make using sanitizers (cf rust-lang/rust#38699)
straightforward. Using a sanitizer requires (re)compiling everything with an
extra LLVM pass and linking to the sanitizer runtime, which is written in C/C++.
With pure MIR RLIBs using a sanitizer would become as simple as cargo rustc -- -Z sanitizer=address; that would compile everything, including std, with the
extra LLVM pass and also link the runtime, which would be provided as an e.g.
librustc_asan.rlib in the std component.

cc @alexcrichton ^ relevant to sanitizer support

I think we support "bundling" native libraries into RLIBs. We might still need to keep supporting this, even if we don't store machine code originating from Rust?

This would be required for the "easy sanitizers" scenario I'm describing above.
Or we could ship the sanitizers as "static libraries", i.e. librustc_asan.a.

Also, note that today one can build statically linked Rust programs using the
MUSL targets without needing to have MUSL installed because libc.a is
embedded inside the std rlib (libstd.rlib) that ships with the std
component.

@japaric that鈥檚 trickly advantage as it prevents us from adding any MIR optimisations that depend on the codegen options set :) We already have one which acts upon the -Zno-landing-pads (ergo --panic=abort)

@nagisa @japaric isn't platform independence listed in the issue description as non advantage?

MIR-only libs would not be platform independent. One could think that that should be the case but because of cfg switches, MIR is not platform independent either.

I'd add to the advantages that it would add more parallelism, as the passes up to MIR being finished take less time than passes up to codegen being finished, and in combination with codegen-units, you can now compile the code more in parallel than before. E.g. right now when I bootstrap the compiler, the "whole world" waits for the rustc crate to compile in a single thread. With the change, we wait less, as only its mir has to be available before we can continue. Afterwards when doing the codegen for the binary, we simply can use codegen-units to get the maximum amount of parallelism the hardware gives us.

Could you elaborate on how it would prevent you from adding such optimizations? The way I see it is that the std component will probably continue to be compiled with -C panic=unwind so if you then compile your app with -C panic=abort then LLVM won't be able to optimize as well (or as fast) as if you had recompiled std with -C panic=abort because of the MIR optimizations you mention. However, we would still be better off than today where the std component is shipped filled with landing pads. Or does LLVM always emit landing pads everywhere if the MIR "optimization" you mention is not present? (In that case, it no longer sounds like an optimization but more like a requirement)

If you want the most optimized code possible then, yeah, you would have to use Xargo or std-aware Cargo to opt into MIR optimizations that depend on codegen options. While you are at it you can also throw in --mir-opt-level=3, etc.

I agree that MIR optimisations don't really prevent you to have platform agnostic MIR. As both their input and output is MIR, those optimisations could be run in the leaf crates, once the target and other info is known.

However, if earlier stages in the compiler depend on the target, which is the case with cfg, one would either have to refactor the entire compiler to understand cfg's in all later stages, or simulate compilation with all possible combinations of cfg's enabled/disabled (in the end cfg is an on/off question). The first approach will probably hugely bloat code complexity of the compiler, the second approach would bloat runtime complexity exponentially by the number of kinds of used cfg's.

So MIR will probably stay platform dependent for some time.

@est31

isn't platform independence listed in the issue description as non advantage?

I'm not sure what are trying to get at? The -C target-cpu=native optimizations I'm referring to are about LLVM having access to the IR of all functions so it can apply autovectorization, CPU scheduling optimizations, etc. Whereas today -C target-cpu=native is not as good because libstd.rlib already contains machine code that was optimized with -C target-cpu=generic. All these optimizations are "within an architecture", e.g. x86, and after e.g. cfg(target_arch) has taken effect so I'm not sure how "platform agnostic MIR" is related

@japaric Removing landing pads from MIR is already somewhat a problem since you cannot add them back after fact, so you already lose some of the so-called advantage by being unable to reverse that. Later on we might want to add something more invasive. For a completely hypothetical example consider something resembling autovectorisation which, again, is not exactly reversible and thus -C target-feature=-stuff would become no-op as well. -C debuginfo=2? Stripped to keep binaries smaller because of -C debuginfo=0 before. -C debug-assertions? No-op even without MIR optimisations as debug assertions is essentially a #[cfg].

So, what I鈥檓 trying to say is that specifying codegen options on leaf crates only would still not be equivalent (and diverge more over time with extra hypothetical MIR opts) to specifying the codegen option(s) for every crate.

You could (as @est31 did just now) argue for storing unoptimised MIR instead, but that, in addition to inreasing size of intermediate rlibs, serializes MIR opts.

isn't platform independence listed in the issue description as non advantage

Codegen options aren鈥檛 exactly related to platform independence in this context.

@michaelwoerister

I'm not sure if this can be listed as an advantage but pure MIR RLIBs would have prevented #38824. The TL;DR is that LLVM raises assertions when you try lower functions that take/return i128 values to PTX code / MPS430 instructions because of bugs in LLVM. With pure MIR RLIBs I expect that if the leaf crate doesn't make use of i128 at all then those functions that use i128 would never be fed into LLVM thus the LLVM assertions wouldn't have been triggered. I suppose that would be some sort of "dead code elimination" pass at the MIR level. So, basically less IR could be fed into LLVM with the right analysis.

@japaric

I'm not sure what are trying to get at?

Ah, sorry, I've misread, you only talked about codegen options.

cc @solson @oli-obk (miri develooers)

Great points @nagisa, @japaric, and @est31! I've added all of them to the list.

I don't think it will affect const evaluation one way or another, but it would help us test Miri outside rustc to be able to easily build dependencies as MIR-only rlibs (with MIR for _all_ items, not just generic/inline/constant ones like in the existing metadata).

Largely, Miri is just like another backend in this context, so it is an instance of this previously mentioned advantage:

There seems be some indication that MIR-only RLIBs would help with making the Rust compiler more backend agnostic (see WASM-related issue #38804).

The problem of caching machine code would be solved in a generalized form by incremental compilation. One has to keep in mind though that incremental compilation will produce less performant code because it prevents many opportunities for inlining.

MSVC using /LTCG:INCREMENTAL is able to achieve LTO with incremental compilation with very fine granularity without sacrificing inlining. According to a blog post, the runtime performance cost of their incremental LTCG vs standard LTCG is less than half a percent, while providing massive gains in link time. So doing something equivalent in Rust is definitely a practical possibility, although it would require a significant amount of support from LLVM. Hopefully ThinLTO will be the magic bullet that provides the necessary support.

Regarding #[no_mangle] pub items from rlibs: While it's unfortunate to break anyone's use case, I think this is only a minor disadvantage. It is not documented that rlibs are ordinary archives with some special contents, in fact this is an implementation detail. In addition, we've had breaking changes in compiler output (e.g., #29520, and at this very second #38876) for lesser reasons.

(I would have more sympathy if someone could give a good reason for using rlibs as archives that isn't already covered by staticlib, cdylib, and other existing tools.)

I'm very enthusiastic about this. I think separating the type checking and code generation into two phases is smart no matter exactly the strategy for when the MIR finally get translated. Gives us a lot of flexibility for coordinating the build. For example, we don't have to delay code generation until the final crate. Cargo itself could spawn parallel processes to do code generation for already-typechecked crates, while their downstreams continue type checking.

By collapsing duplicate monomorphizations, I'm hopeful that this will lead to significant improvements to the major disadvantages of monomorphization, the bloat and the compile time. We could end up in a position where we can say, "the generics model is like C++, but more efficient". That could be a major advantage.

One significant disadvantage with this model is link-time scalability. This will put massive memory pressure on the leaf crate builds, and that could bite us in the future as bigger projects are written in Rust.

LTO is a downside too because of compile time. I'd expect we'd need a range of strategies for the actual codegen, to accomplish different goals in -O0 vs -O3.

The leaf crates (executables, staticlibs, dylibs, cdylibs) would take more time to compile

I鈥檝e very worried about this for Servo. There鈥檚 currently 319 crates in the dependency graph, but after an initial build only a few of them are recompiled in the typical edit-build-test cycle. Even so, compile times are already pretty bad.

Do MIR-only rlibs mean doing code generation for the entire dependency graph every time? This sounds like unacceptable explosion of compile times.

@SimonSapin I see no point in experimenting with this on Servo's scale without enabling incremental recompilation (with ThinLTO in the future, too).

Btw I hear @rkruppe is making good progress towards such a compilation mode.

We discussed this in the last @rust-lang/tools meeting and the consensus was that this looks like a good idea in many ways but we will not pursue it as long as it would mean a significant compile regression.

So then we'll be pursuing this as soon as Rust is able to fully take advantage of incremental compilation using ThinLTO?

Given the recent work to make the compiler incremental, I wonder if it will be possible to perform incremental builds at the level of individual functions, caching anything that hasn't changed. That could allow amazing feats, such as executables that are incrementally updated as the user compiles their source code.

Just jotting this down before I forget about it again: Currently, statics are always translated locally, never cross-crate. If we stick to this, rlibs would still generate some object files that contain only statics, no code. However, that invites a bunch of headaches. For example, if a static references a function (e.g. an interrupt vector table storing function pointers), we'd need to translate those too — or remember them somewhere and use them as roots for trans item collection in downstream crates.

So it would be cleaner to also delay translation of statics to the final binary/staticlib/cdylib. This requires non-trivial refactoring though, as a lot of the current code is written under the assumption that all statics to translate come from the current crate (e.g., TransItem::Static stores a NodeId, not a DefId).

It also means metadata needs a way to enumerate all the statics and other collector roots (monomorphic functions, and some more things in "eager" mode) from other crates. The information is all there, but there's no efficient/easy way to enumerate them.

We can do a stepwise migration towards MIR-only RLIBs:

  1. store all MIR in rlibs (not just generics, #[inline] and constants).
  2. optimize the MIR before storing it (this exposes the issues with statics mentioned above, because we now can inline functions that contain statics and aren't marked #[inline])
  3. add an unstable compiler flag that tries to build from MIR instead of precompiled artifacts
  4. remove issues one by one until crater runs through
  5. remove the old path.
  1. optimize the MIR before storing it (this exposes the issues with statics mentioned above, because we now can inline functions that contain statics and aren't marked #[inline])

Can you elaborate on the parenthetical? I don't think MIR inlining on its own can have any effect on where and how statics are translated. Even when statics are lexically nested in a function, they're not part of the function's MIR. Statics are also trans-item-collect'd separately from MIR (as part of walking the HIR of the current crate), at least last time I checked.

Can you elaborate on the parenthetical?

I have been getting undefined references to statics inside functions that were inlined into other crates when compiling libstd via xargo with -Zalways-encode-mir -Zmir-opt-level=3. I have done some digging, but didn't get to the root.

I believe I found yet another benefit to MIR-only rlibs: Currently, cargo check builds the entire dependency graph with --emit metadata only, to avoid running translation. The downside is that the output of --emit metadata (*.rmeta files) is sufficently different from normal (rlib) outputs that you need to rebuild the whole dependency graph when it's time to cargo build (and conversely, if you have a full build and run cargo check, all rmeta files are generated from scratch). This duplicates compilation effort and metadata on disk.

With MIR-only rlibs, the only remaining differences between rlibs and rmeta files would (1) the rlib has wrapped the metadata in an archive file, and (2) the archive includes bundled native libraries, if any. Creating the archive should have neglegible cost, so we could probably get rid of rmeta files and make --emit metadata behave like --emit link for non-leaf-crates. It would still need to avoid running codegen in leaf crates (so it's not quite an alias) but it would greatly reduce the aforementioned duplication.

(Going further, @nagisa (I think?) once suggested to me on IRC that metadata and machine code should be two separate files on disk. I found this appealing for other reasons, but to stay on-topic, such a split would make it possible to pick up a previously-generated rmeta file and generate all the machine code and so on from it, without recompiling the leaf crate from scratch. But that is mostly orthogonal to MIR-only rlibs, so whatever.)

@oli-obk re: the undefined references to statics: I don't have much time to investigate, but one cuplrit I can think of would be internalize_symbols. Specifically, if a (non-pub) static is apparently only used from within the CGU it's defined in, it is marked as internal and LLVM will remove it if it's not accessed anywhere in that CGU.

Anyway, it would be great if you could file an issue for that (if there isn't one already) with a small test case. This is definitely a bug, but so far I don't believe it's an issue with statics getting translated locally.

uuh i'm scratching my head what i'm missing here: how is this gong to work with -C linker= ? We're relying on the fact that the hash of the input file to the linker is the same every invocation. If the objects get translated to native code before being passed to the linker, is the translation stable? Or will the targets system linker actually only ever see a single already relocated and re-ordered object file ?

This issue only affects which Rust code (or monomorphization of generic Rust code) gets translated into which LLVM compilation unit. It doesn't affect what happens afterwards with these LLVM modules, the resulting object files, etc. — and while it's plausible that MIR-only rlibs would enable more innovation in the later stages of the backend, nothing along those lines has been proposed or even discussed as far as I remember.

Another (marginal) benefit, assuming #[inline] stops copying function bodies into multiple codegen units as discussed in the context of #44941: #[inline] becomes less necessary (only adds inlinehint instead of enabling inlining at all in certain cases) and less complicated (easier to explain, easier to tell if it's useful).

I've put together a proof-of-concept implementation of this in https://github.com/rust-lang/rust/pull/48373. Although the implementation crashes for many crates, I was able to collect timings for a number of projects. The tables show the aggregate time spent for various tasks while compiling the whole crate graph. In many cases we do less work overall but due to worse parallelization, wall-clock time increases. I.e. everything seems to be bottlenecked on the MIR-to-LLVM translation in the leaf crates. To me this suggests that MIR-only RLIBs are blocked on the compiler internals being parallelized.

ripgrep - cargo build

| | regular | MIR-only | % |
|------------------------|----------|----------|----------|
| LLVM codegen passes | 33.90 | 32.52 | 95.9 % |
| LLVM function passes | 1.39 | 1.35 | 97.5 % |
| LLVM module passes | 2.18 | 1.95 | 89.8 % |
| MonoItem collection | 2.80 | 2.09 | 74.4 % |
| translation | 23.73 | 19.97 | 84.1 % |
| LLVM total | 37.46 | 35.83 | 95.6 % |
| BUILD total | 20.92 | 26.14 | 125.0 % |

encoding-rs - cargo test --no-run

| | regular | MIR-only | % |
|------------------------|----------|----------|----------|
| LLVM codegen passes | 13.11 | 7.28 | 55.6 % |
| LLVM function passes | 0.57 | 0.33 | 58.1 % |
| LLVM module passes | 0.90 | 0.44 | 48.7 % |
| MonoItem collection | 1.19 | 0.69 | 58.1 % |
| translation | 8.68 | 6.08 | 70.1 % |
| LLVM total | 14.59 | 8.06 | 55.2 % |
| BUILD total | 15.73 | 14.37 | 91.4 % |

webrender - cargo build

| | regular | MIR-only | % |
|------------------------|----------|----------|----------|
| LLVM codegen passes | 109.42 | 69.17 | 63.2 % |
| LLVM function passes | 4.55 | 3.10 | 68.2 % |
| LLVM module passes | 1.63 | 1.06 | 64.7 % |
| MonoItem collection | 10.70 | 5.64 | 52.7 % |
| translation | 102.95 | 58.70 | 57.0 % |
| LLVM total | 115.60 | 73.33 | 63.4 % |
| BUILD total | 72.30 | 68.64 | 94.9 % |

futures-rs - cargo test --no-run

| | regular | MIR-only | % |
|------------------------|----------|----------|----------|
| LLVM codegen passes | 41.19 | 48.67 | 118.1 % |
| LLVM function passes | 1.68 | 1.93 | 115.0 % |
| LLVM module passes | 0.21 | 0.22 | 107.3 % |
| MonoItem collection | 5.86 | 6.90 | 117.8 % |
| translation | 55.48 | 69.84 | 125.9 % |
| LLVM total | 43.08 | 50.82 | 118.0 % |
| BUILD total | 17.28 | 19.18 | 111.0 % |

tokio-webpush-simple - cargo build

| | regular | MIR-only | % |
|------------------------|----------|----------|----------|
| LLVM codegen passes | 33.98 | 22.55 | 66.4 % |
| LLVM function passes | 1.53 | 0.95 | 62.0 % |
| LLVM module passes | 0.30 | 0.20 | 66.8 % |
| MonoItem collection | 3.80 | 2.11 | 55.5 % |
| translation | 39.09 | 22.99 | 58.8 % |
| LLVM total | 35.81 | 23.70 | 66.2 % |
| BUILD total | 22.28 | 21.54 | 96.7 % |

Number of LLVM function definitions generated for whole crate graph

| | MIR-only | regular |
|----------------------|----------|-----------|
| ripgrep | 22683 | 34239 |
| encoding-rs test | 8393 | 15116 |
| webrender | 72238 | 114239 |
| futures-rs test | 57565 | 46935 |
| tokio-webpush-simple | 27346 | 44961 |

What if we did this, but for libcore...libstd, at stage1? It might be worth it, despite the huge number of tests, and should be a huge improvement when running just a few tests.

(prompted by @dwijnand's comments on Discord about their workflow of changing librustc and re-checking only one test - with incremental, most of the time is spent building libcore...libstd)

EDIT: here's some data, since I wanted to replicate what @dwijnand was seeing:

  • no incremental, libstd:

    • stage0 check: 29s (./x.py check src/libstd)

    • stage0 build: 36s

    • stage1 build: 58s (maybe because of LLVM/debug-assertions?)

  • incremental at stage0, after doing touch src/librustc/lib.rs:

    • (sadly, this still produces a slow stage1/bin/rustc - until #53673 reaches beta)

Building stage0 compiler artifacts (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)
    Finished release [optimized] target(s) in 1m 01s
Building stage0 codegen artifacts (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu, llvm)
    Finished release [optimized] target(s) in 48.75s
Building stage1 std artifacts (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)
    Finished release [optimized] target(s) in 7m 39s
Building stage1 test artifacts (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)
    Finished release [optimized] target(s) in 30.12s

Most of the time is spent building libstd, which should be improved once #53673 ends up in beta (perhaps at the cost of the rustc build time?), so the performance impact of using MIR-only rlibs might become less significant - we'll have to wait and see, I suppose.

Now that cargo passes --embed-bitcode=no, is there anything left to do for this?

It turns out I was confused - this issue is about never going through LLVM at all, while --embed-bitcode=no instead embeds the object code generated by LLVM.

https://rust-lang.zulipchat.com/#narrow/stream/122651-general/topic/How.20to.20learn.20more.20about.20crate.20metadata.3F/near/216169138

Was this page helpful?
0 / 5 - 0 ratings