As reported on Twitter: https://twitter.com/jleedev/status/964324747277455360
Looks like ripgrep with 1.24 consumes far more cpu power to compile than it did with 1.23.
Is there a way to run RUSTFLAGS="-Z time-llvm-passes -Z time-passes" on stable/beta to get some background data on this issue?
@matthiaskrgr the RUSTC_BOOTSTRAP=1 environment variable should do the trick.
Looks like it's the codegen-units that make the difference here. (note: this system has a i5-7200U dual core / 4 threads cpu).
The tweet suggests that the machine could be an RPI, how many cores do these things usually have?
RUSTC_BOOTSTRAP=1 RUSTFLAGS="-Z time-passes -C codegen-units=1" cargo +1.23.0 build --release
real 1m31,136s
user 3m5,831s
sys 0m3,018s
RUSTC_BOOTSTRAP=1 RUSTFLAGS="-Z time-passes " cargo +1.23.0 build --release
real 1m29,970s
user 3m1,953s
sys 0m2,963s
````
RUSTC_BOOTSTRAP=1 RUSTFLAGS="-Z time-passes -C codegen-units=1" cargo +stable build --release
real 1m27,657s
user 2m57,814s
sys 0m3,034s
````
````
RUSTC_BOOTSTRAP=1 RUSTFLAGS="-Z time-passes" cargo +stable build --release
real 1m34,053s
user 5m35,334s
sys 0m4,888s
Parallel codegen-units is not as efficient as the old serial approach, and is likely the cause.
My bad, this a compile-time regression?
triage: P-medium
There doesn't seem to be a lot to do here, other than perhaps reconsidering our defaults?
cc @alexcrichton -- what is the current strategy for --release, anyway? (How many CGU etc?)
Release builds default to 16 codegen units.
I believe this is sort of "expected fallout" of enabling multiple codegen units by default in release builds. Given that the overall time didn't regress here this is likely just "business as usual"
@alexcrichton - I'd caution us against treating this as "business as usual". The Rust compiler is consuming more energy and yielding nothing in return. Since one of the core tenants of Rust is its efficiency, to be so much less efficient during a compilation feels like a step backward.
Like Aaron likes to say, we should find a way to "bend the curve" and avoid improving in some way only to noticeably regress in others.
@jonathandturner I agree we should be cautious, and I agree broadly that we shouldn't bush off issues like this. That being said we aren't getting nothing in return for ThinLTO, rather we're seeing broad improvements across the board. Some crates regress a little but most see huge improvements with parallel compilation.
I'm less and less sure about ThinLTO. The code it produces is less performant than full LTO, so it's not the best choice for release artifacts. At the same time it compiles slower than -Ccodegen-units=N which is usually optimized enough for the "debug-opt" use case.
@michaelwoerister it's true that ThinLTO + multiple codegen units for a crate isn't always as fast as one codegen unit for that crate, but I think we're really undervaluing the compile time wins from multiple codegen units and overvaluing microbenchmarks and a few percent performance differences.
For example I feel like multiple codegen units + ThinLTO is a fantastic default for rustc. Builds feel much faster nowadays on multicore machines as it's not sitting with one CPU pegged for multiple minutes on librustc. Stats of perf.rust-lang.org show that ThinLTO does indeed regress performance by ~5% but in the grand scheme of things that seems like a tiny (and barely perceptible) tradeoff for the compile time wins we're seeing.
I also feel like we write off the compile times of release builds too frequently as "oh it's fine if it takes a long time to build". In reality though I feel like release builds happen way more often than we necessarily perceive. For example all Rust CI builds are in release mode (as we publish artifacts) and we can't really get around that. I'd imagine that there are other projects where you're testing an optimized binary pretty frequently but still need to make changes.
Tagging with T-compiler (something I was tempted to do last week but I'm not sure why I balked at that time).
Most helpful comment
@alexcrichton - I'd caution us against treating this as "business as usual". The Rust compiler is consuming more energy and yielding nothing in return. Since one of the core tenants of Rust is its efficiency, to be so much less efficient during a compilation feels like a step backward.
Like Aaron likes to say, we should find a way to "bend the curve" and avoid improving in some way only to noticeably regress in others.