The rustc guide states that:
- Stage 0: the stage0 compiler is usually the current _beta_ compiler
(x.py
will download it for you); you can configurex.py
to use something
else, though.- Stage 1: the code in your clone (for new version) is then
compiled with the stage0 compiler to produce the stage1 compiler.
However a run of rustbuild like so ./x.py build --stage 0
outputs:
Building stage0 std artifacts (x86_64-apple-darwin -> x86_64-apple-darwin)
Building stage0 test artifacts (x86_64-apple-darwin -> x86_64-apple-darwin)
Building stage0 compiler artifacts (x86_64-apple-darwin -> x86_64-apple-darwin)
[etc]
Personally I think the guide makes more sense. However fixing rustbuild would mean that --stage 1
would become --stage 2
(and so forth).
(I think this misunderstanding has been at the root of all my rustc feedback loop problems, so I'd be happy to resolve it or attempt to.)
FWIW, the stages printed in rustc output aren't wrong -- the stage0 std
artifacts there are going to be used for anything built by the stage0 compiler: for example, rustc, test artifacts. This pattern continues in latter stages.
Specifically, "Stage 0: the stage0 compiler is usually the current beta compiler" is true, but that's (mostly, modulo build scripts in stage0 std compilation) only true for the rustc
binary itself (and associated dynamic libraries). The compiler we download is never used for anything beyond compiling rustbuild, std, test, and rustc, but when we're compiling test and rustc, that compiler is using the freshly produced std.
I'm not sure if any of that made sense -- I myself still struggle with this pretty much every time I confront any sort of staging issue or work with rustbuild. If you have suggestions, I'd love to hear them; I think the problem is that there are two concepts at play here: a compiler (with its set of dependencies) and it's "target" libraries (std, test, and rustc-ish). Both are staged, but in sort of a staggered manner. That makes talking about any of this quite hard.
I also found the situation quite confusing.
@Mark-Simulacrum Perhaps it would be easier to understand and give words to things if we had a sort of sequence/activity diagram or flowchart for stages and dependencies that exist. A picture says...
I have found it to be confusing, too. I had some notes lying around about how it works. Perhaps one way to think of it is:
stage 0 uses stage 0 compiler to create stage 0 artifacts which will later be uplifted to stage1
That's a bit convoluted. I would expect it to be more along the lines of "stage 0 builds stage 1", but alas it is not.
It's also confusing because building HOST std and TARGET std are different based on the stage (notice below how stage2 only builds non-host std targets — I don't know why). And --keep-stage
still seems a bit confusing to me.
Stage 0 Action | Output
------ | ------
beta extracted | build/HOST/stage0
stage0(beta) builds bootstrap | build/bootstrap
stage0(beta) builds libstd | build/HOST/stage0-std/TARGET
copy stage0-std (HOST only) | build/HOST/stage0-sysroot/lib/rustlib/HOST
stage0(beta) (sysroot stage0-sysroot) builds libtest | build/HOST/stage0-test/TARGET
copy stage0-test (HOST only) | build/HOST/stage0-sysroot/lib/rustlib/HOST
stage0(beta) (sysroot stage0-sysroot) builds rustc | build/HOST/stage0-rustc/HOST
copy stage0-rustc (except executable) | build/HOST/stage0-sysroot/lib/rustlib/HOST
build llvm | build/HOST/llvm
stage0(beta) (sysroot stage0-sysroot) builds codegen | build/HOST/stage0-codgen/HOST
stage0(beta) (sysroot stage0-sysroot) builds rustdoc | build/HOST/stage0-tools/HOST
--stage=0 stops here
Stage 1 Action | Output
------ | ------
copy (uplift) stage0-rustc executable | build/HOST/stage1/bin
copy (uplift) stage0-sysroot | build/HOST/stage1/lib
stage1 (sysroot stage1) builds libstd | build/HOST/stage1-std/TARGET
copy stage1-std (HOST only) | build/HOST/stage1/lib/rustlib/HOST
stage1 (sysroot stage1) builds libtest | build/HOST/stage1-test/TARGET
copy stage1-test (HOST only) | build/HOST/stage1/lib/rustlib/HOST
stage1 (sysroot stage1) builds rustc | build/HOST/stage1-rustc/HOST
copy stage1-rustc (except executable) | build/HOST/stage1/lib/rustlib/HOST
stage1 (sysroot stage1) builds codegen | build/HOST/stage1-codegen/HOST
--stage=1 stops here
Stage 2 Action | Output
------ | ------
copy (uplift) stage1-rustc executable | build/HOST/stage2/bin
copy (uplift) stage1-sysroot | build/HOST/stage2/lib and build/HOST/stage2/lib/rustlib/HOST
stage2 (sysroot stage2) builds libstd (except HOST?) | build/HOST/stage2-std/TARGET
copy stage2-std (not HOST targets) | build/HOST/stage2/lib/rustlib/TARGET
stage2 (sysroot stage2) builds libtest (except HOST?) | build/HOST/stage2-test/TARGET
copy stage2-test (not HOST targets) | build/HOST/stage2/lib/rustlib/TARGET
stage2 (sysroot stage2) builds rustdoc | build/HOST/stage2-tools/HOST
copy rustdoc | build/HOST/stage2/bin
--stage=2 stops here
Notes:
Maybe adding a little extra detail to rustc-guide would help? Does any of that help, or is it more confusing now?
Thank you all for your comments.
I understand now that stage0 is a full-fledged stage and no one feels like that's wrong, which is a good thing because it means we just need to fix the guide not change the semantics of rustc guide.
I think there are several misleading things in that section of the rustc guide:
I'll come back to this use your info and insights to propose some changes to the rustc-guide repo.
Here's another part that has me doubting everything I think I know about how rustc is built, the guide says:
In particular, the newer version of the compiler,
libstd
, and other tooling may use some unstable features internally.
and in a few more places it makes reference to libstd
being the compiler. Is that true?? I thought libstd
was the standard library. Isn't src/rustc
the new compiler? Or maybe for consumption by rustup or for rustc's test suite it's something a step down, like src/librustc_driver
or so.
Also, in https://rust-lang.github.io/rustc-guide/how-to-build-and-run.html#workflow about the commend ./x.py build -i --stage 1 src/libstd --keep-stage 1
it says:
The effect of
--keep-stage 1
is that we just _assume_ that the old standard library can be re-used.
Which "old standard library" does it mean there, stage 0 or stage 1? Also, it's confusing to target stage 1 and keep stage 1. Assuming it's not equivalent to ./x.py build -i --stage 0 src/rustc
, I think the guide should detail why it's not.
@dwijnand the first bit is just a bit confusingly worded. rustc is the compiler and libstd is the standard library (which includes libcore, liballoc, etc.). It just means "rustc, and libstd, and other tooling may use unstable features internally".
@jonas-schievink I'm not sure. You're right, that could be the interpretation, but in https://rust-lang.github.io/rustc-guide/how-to-build-and-run.html#build-flags it says from running ./x.py build -i --stage 1 src/libstd
:
This final product (stage1 compiler + libs built using that compiler) is what you need to build other rust programs.
And again in https://rust-lang.github.io/rustc-guide/how-to-build-and-run.html#workflow:
- Initial build:
./x.py build -i --stage 1 src/libstd
- As documented above, this will build a functional stage1 compiler
Yes, you need both the Rust compiler rustc and a libstd compatible with that compiler in order to build Rust programs (unless you use #![no_std]
or #![no_core]
). The second thing you linked is pretty misleading though - passing std/libstd
to x.py won't just build a compiler, AFAIK it will perform all "Stage 0" actions described in @ehuss' comment above, which builds the stage 1 rustc, and the first few steps of the "Stage 1" actions up to "stage1 (sysroot stage1) builds libstd" (and perhaps the copy operation following that?).
...this really is hairy
A PR to the rustc guide would be much appreciated. A better naming scheme in the build system would be helpful too: if it is built with stage N, it should be called stage N+1, even if that stage contains only a libstd or only a compiler...
@jonas-schievink I'm concerned that it's instructing to compile stage 1 libstd just so it does the two steps before: copy rustc and sysroot into build/HOST/stage1/bin
and build/HOST/stage1/lib
... That would be very wasteful.
@mark-i-m
A better naming scheme in the build system would be helpful too: if it is built with stage N, it should be called stage N+1, even if that stage contains only a libstd or only a compiler...
Yeah, perhaps that can be achieved without any or too much disruption. I'll try and send a PR soon fixing some of the explanation and probably including Eric's breakdown.
Actually... @jonas-schievink
you need both the Rust compiler rustc _and_ a libstd compatible with that compiler in order to build Rust programs (unless you use
#![no_std]
or#![no_core]
).
At stage 0 beta rust compiles first libstd then rustc. Are those two artefacts compatible with one another?
(take everything I say with a grain of salt, I'm not really sure how the entire bootstrap process works)
At stage 0 beta rust compiles first libstd then rustc. Are those two artefacts compatible with one another?
I think this step is needed to ensure the compiler is always built against its own libstd, not the libstd shipped with beta (downloaded as stage0). This is where the #[cfg(stage0)]
annotations in libstd are used, since they make libstd build on the beta compiler. The resulting libstd is compatible with the beta compiler.
The compiler is then built and links against this libstd we just built. Now we want to build libstd again, using that compiler, which now has all features that libstd might want, so we can turn off #[cfg(stage0)]
. The libstd created by that is up-to-date and compatible with the built compiler (as in, you can use the resulting rustc+libstd pair to build any Rust program using the latest Rust features).
(I'm not sure how the process around proc macros or compiler plugins works, I think those have a few more caveats)
Thanks, @jonas-schievink, your explanation is looking more and more likely.
The question I have about that being how it works is: if a proper, usable distribution requires a fresh compiler and a fresh library built with that compiler, why is stage0 first libstd and then rustc? It's like staging is delimited by compiler builds which is only half of a distribution build. Why not use beta rustc + beta std to build the first iteration of the distribution?
In a proper release of Rust, what's at the tail of the build? Is there a final-final-final compiler that build libstd and together they ship?
In a proper release of Rust, what's at the tail of the build? Is there a final-final-final compiler that build libstd and together they ship?
I believe the compiler and libs that ends up in the "stage2" directory is what is typically distributed.
I made a simplified diagram of what I wrote above: https://gist.github.com/ehuss/e40c18e1678fec0aa5861fd0d1653a87
if a proper, usable distribution requires a fresh compiler and a fresh library built with that compiler, why is stage0 first libstd and then rustc?
I'm not entirely sure, but I think this is because it's convenient if we only have to make rustc build against the libstd inside the same repository instead of having to make it compatible with the downloaded beta libstd.
The libstd in this repo often has quite a few #[cfg(stage0)]
annotations that make it build against the beta libstd. With this setup, we don't need those for rustc as well.
Or I'm completely wrong, we do sometimes need #[cfg(stage0)]
in rustc and the first libstd build is unnecessary or done for other complicated reasons.
I'm just going to leave a semi-quick comment here (well, it turned out longer than expected) -- I want to more fully flesh out this documentation, unfortunately I just don't have the time right now; probably after the all hands I will have the time to respond more thoroughly. I'm hopeful that we can get some good documentation/understanding here for all parties though!
I've excerpted a few bits from above with some answers, hopefully this at least helps for now:
notice below how stage2 only builds non-host std targets — I don't know why (https://github.com/rust-lang/rust/issues/57963#issuecomment-458429280)
This is because during stage 2, the host std is uplifted from the "stage 1" std -- specifically, when you see "Building stage 1 artifacts" that's later copied into stage 2 as well (both the compiler's libdir and the sysroot).
I'm concerned that it's instructing to compile stage 1 libstd just so it does the two steps before: copy rustc and sysroot into build/HOST/stage1/bin and build/HOST/stage1/lib... That would be very wasteful. https://github.com/rust-lang/rust/issues/57963#issuecomment-458630887
This is not wasteful -- that std is pretty much necessary for any useful work with the compiler. Specifically, it's used as the std for programs compiled by that compiler (so when you compile fn main() { }
that links to the std compiled last with x.py build --stage 1 src/libstd
).
A better naming scheme in the build system would be helpful too: if it is built with stage N, it should be called stage N+1, even if that stage contains only a libstd or only a compiler...
Yeah, perhaps that can be achieved without any or too much disruption. I'll try and send a PR soon fixing some of the explanation and probably including Eric's breakdown. https://github.com/rust-lang/rust/issues/57963#issuecomment-458630887
I've considered doing this in the past but it's actually somewhat misleading. Every time we compile any of the main artifacts ("std", "test", "rustc") we're actually performing two steps. I'll call the compiler which compiles these libraries A
and the 'next' compiler B
. When we compile std, for example, that std will be linked to programs built by A
(including test and rustc built later on). It will also be used for compiler B to link against itself. This is somewhat intuitive if you think of compiler B as "just" a program that we're building with A. In some ways, rustc (the binary, not the rustbuild step) could be thought of as one of the only no_core
binaries out there.
At stage 0 beta rust compiles first libstd then rustc. Are those two artefacts compatible with one another? https://github.com/rust-lang/rust/issues/57963#issuecomment-458634115
The rustc that's built is linked to the freshly-built libstd. So not only are they compatible, that's the only way rustc is guaranteed to build. As @jonas-schievink discusses in https://github.com/rust-lang/rust/issues/57963#issuecomment-458641928, "the compiler is always built against its own libstd" is the correct reason for this. That means that for the most part only std needs to be cfg-gated. This means that rustc can use features added to std immediately after their addition, there's no need to wait until they get into beta.
However, in "The libstd created by that is up-to-date and compatible with the built compiler (as in, you can use the resulting rustc+libstd pair to build any Rust program using the latest Rust features)." (https://github.com/rust-lang/rust/issues/57963#issuecomment-458641928), that's perhaps a bit misleading. The libstd built by the stage1/bin/rustc compiler, also known as stage 1 std artifacts, is not necessarily ABI-compatible with that compiler. That is, the rustc
binary most likely could not use this std itself. It is however ABI-compatible with any programs that the stage1/bin/rustc
binary builds (including itself), so in that sense they're paired.
Perhaps worth noting -- when I say "ABI" I most likely mean more than that. I've not received any concrete answers from rustc devs as to what actually needs to stay the same -- but loosely I believe this is broadly metadata encoding and the ABI itself.
This is also where --keep-stage 1 src/libstd
comes into play. Because most changes to the compiler don't actually change the ABI, once you've produced a libstd in stage 1, you can _probably_ just reuse it with a different compiler. If the ABI hasn't changed, you're good to go, no need to spend the time recompiling that std. --keep-stage
simply assumes the previous compile is fine and copies those artifacts into the appropriate place, skipping the cargo
invocation.
The question I have about that being how it works is: if a proper, usable distribution requires a fresh compiler and a fresh library built with that compiler, why is stage0 first libstd and then rustc? It's like staging is delimited by compiler builds which is only half of a distribution build. Why not use beta rustc + beta std to build the first iteration of the distribution? https://github.com/rust-lang/rust/issues/57963#issuecomment-458646708
In a proper release of Rust, what's at the tail of the build? Is there a final-final-final compiler that build libstd and together they ship?
The reason we first build std, then test, then rustc, is largely just because we want to minimize cfg(stage0)
in the code for rustc. Currently rustc is always linked against a "new" std/test so it doesn't ever need to be concerned with differences in std; it can assume that the std is as fresh as possible.
The reason we need to build twice is because of ABI compatibility. The beta compiler has it's own ABI, then the stage1/bin/rustc compiler will produce programs/libraries with the new ABI. We actually used to build three times, but because we assume that the ABI is constant within a codebase, we presume that the libraries produced by the "stage2" compiler (produced by the stage1/bin/rustc compiler) is ABI-compatible with the stage1/bin/rustc compiler's produced libraries. What this means is that we can skip that final compilation -- and simply use the same libraries as the stage2/bin/rustc compiler uses itself for programs it links against.
This stage2/bin/rustc compiler is shipped to end-users, along with the "stage 1 {std,test,rustc}" artifacts.
I hope this was helpful -- I'll try to make some time to leave another response tomorrow, maybe even put together that graphic @Centril suggested, but not sure if that'll happen yet.
Thank you, @Mark-Simulacrum, for that. I must confess that while I did understand parts of it, large parts flew over my head. 😕 However...
I want to more fully flesh out this documentation, unfortunately I just don't have the time right now; probably after the all hands I will have the time to respond more thoroughly. I'm hopeful that we can get some good documentation/understanding here for all parties though!
I'll hold off on making any changes myself, then. If you send changes to the rustc-guide I could help review them and perhaps add some of my own.
I have a proposal for explaining this better in https://github.com/rust-lang/rustc-dev-guide/pull/843. Please let me know if that helps at all, and if not, what I could do to make it better :)
Most helpful comment
I have found it to be confusing, too. I had some notes lying around about how it works. Perhaps one way to think of it is:
That's a bit convoluted. I would expect it to be more along the lines of "stage 0 builds stage 1", but alas it is not.
It's also confusing because building HOST std and TARGET std are different based on the stage (notice below how stage2 only builds non-host std targets — I don't know why). And
--keep-stage
still seems a bit confusing to me.Stage 0 Action | Output
------ | ------
beta extracted | build/HOST/stage0
stage0(beta) builds bootstrap | build/bootstrap
stage0(beta) builds libstd | build/HOST/stage0-std/TARGET
copy stage0-std (HOST only) | build/HOST/stage0-sysroot/lib/rustlib/HOST
stage0(beta) (sysroot stage0-sysroot) builds libtest | build/HOST/stage0-test/TARGET
copy stage0-test (HOST only) | build/HOST/stage0-sysroot/lib/rustlib/HOST
stage0(beta) (sysroot stage0-sysroot) builds rustc | build/HOST/stage0-rustc/HOST
copy stage0-rustc (except executable) | build/HOST/stage0-sysroot/lib/rustlib/HOST
build llvm | build/HOST/llvm
stage0(beta) (sysroot stage0-sysroot) builds codegen | build/HOST/stage0-codgen/HOST
stage0(beta) (sysroot stage0-sysroot) builds rustdoc | build/HOST/stage0-tools/HOST
--stage=0 stops here
Stage 1 Action | Output
------ | ------
copy (uplift) stage0-rustc executable | build/HOST/stage1/bin
copy (uplift) stage0-sysroot | build/HOST/stage1/lib
stage1 (sysroot stage1) builds libstd | build/HOST/stage1-std/TARGET
copy stage1-std (HOST only) | build/HOST/stage1/lib/rustlib/HOST
stage1 (sysroot stage1) builds libtest | build/HOST/stage1-test/TARGET
copy stage1-test (HOST only) | build/HOST/stage1/lib/rustlib/HOST
stage1 (sysroot stage1) builds rustc | build/HOST/stage1-rustc/HOST
copy stage1-rustc (except executable) | build/HOST/stage1/lib/rustlib/HOST
stage1 (sysroot stage1) builds codegen | build/HOST/stage1-codegen/HOST
--stage=1 stops here
Stage 2 Action | Output
------ | ------
copy (uplift) stage1-rustc executable | build/HOST/stage2/bin
copy (uplift) stage1-sysroot | build/HOST/stage2/lib and build/HOST/stage2/lib/rustlib/HOST
stage2 (sysroot stage2) builds libstd (except HOST?) | build/HOST/stage2-std/TARGET
copy stage2-std (not HOST targets) | build/HOST/stage2/lib/rustlib/TARGET
stage2 (sysroot stage2) builds libtest (except HOST?) | build/HOST/stage2-test/TARGET
copy stage2-test (not HOST targets) | build/HOST/stage2/lib/rustlib/TARGET
stage2 (sysroot stage2) builds rustdoc | build/HOST/stage2-tools/HOST
copy rustdoc | build/HOST/stage2/bin
--stage=2 stops here
Notes:
Maybe adding a little extra detail to rustc-guide would help? Does any of that help, or is it more confusing now?