Rust: CPU usage regression for ripgrep

Created on 16 Feb 2018 · 14Comments · Source: rust-lang/rust

As reported on Twitter: https://twitter.com/jleedev/status/964324747277455360

Looks like ripgrep with 1.24 consumes far more cpu power to compile than it did with 1.23.

C-bug I-compiletime P-medium T-compiler regression-from-stable-to-stable

Source

jonathandturner

Most helpful comment

@alexcrichton - I'd caution us against treating this as "business as usual". The Rust compiler is consuming more energy and yielding nothing in return. Since one of the core tenants of Rust is its efficiency, to be so much less efficient during a compilation feels like a step backward.

Like Aaron likes to say, we should find a way to "bend the curve" and avoid improving in some way only to noticeably regress in others.

jonathandturner on 25 Feb 2018

👍3

All 14 comments

Is there a way to run RUSTFLAGS="-Z time-llvm-passes -Z time-passes" on stable/beta to get some background data on this issue?

matthiaskrgr on 16 Feb 2018

@matthiaskrgr the RUSTC_BOOTSTRAP=1 environment variable should do the trick.

pietroalbini on 16 Feb 2018

🎉1

Looks like it's the codegen-units that make the difference here. (note: this system has a i5-7200U dual core / 4 threads cpu).
The tweet suggests that the machine could be an RPI, how many cores do these things usually have?
RUSTC_BOOTSTRAP=1 RUSTFLAGS="-Z time-passes -C codegen-units=1" cargo +1.23.0 build --release real 1m31,136s user 3m5,831s sys 0m3,018s

RUSTC_BOOTSTRAP=1 RUSTFLAGS="-Z time-passes " cargo +1.23.0 build --release real 1m29,970s user 3m1,953s sys 0m2,963s

````
RUSTC_BOOTSTRAP=1 RUSTFLAGS="-Z time-passes -C codegen-units=1" cargo +stable build --release

real 1m27,657s
user 2m57,814s
sys 0m3,034s

````

````
RUSTC_BOOTSTRAP=1 RUSTFLAGS="-Z time-passes" cargo +stable build --release

real 1m34,053s
user 5m35,334s
sys 0m4,888s

matthiaskrgr on 16 Feb 2018

Parallel codegen-units is not as efficient as the old serial approach, and is likely the cause.

ishitatsuyuki on 16 Feb 2018

Added to https://github.com/rust-lang/rust/issues/47745

nikomatsakis on 22 Feb 2018

My bad, this a compile-time regression?

nikomatsakis on 22 Feb 2018

triage: P-medium

There doesn't seem to be a lot to do here, other than perhaps reconsidering our defaults?

cc @alexcrichton -- what is the current strategy for --release, anyway? (How many CGU etc?)

nikomatsakis on 22 Feb 2018

Release builds default to 16 codegen units.

sfackler on 22 Feb 2018

I believe this is sort of "expected fallout" of enabling multiple codegen units by default in release builds. Given that the overall time didn't regress here this is likely just "business as usual"

alexcrichton on 24 Feb 2018

Like Aaron likes to say, we should find a way to "bend the curve" and avoid improving in some way only to noticeably regress in others.

jonathandturner on 25 Feb 2018

👍3

@jonathandturner I agree we should be cautious, and I agree broadly that we shouldn't bush off issues like this. That being said we aren't getting nothing in return for ThinLTO, rather we're seeing broad improvements across the board. Some crates regress a little but most see huge improvements with parallel compilation.

alexcrichton on 25 Feb 2018

I'm less and less sure about ThinLTO. The code it produces is less performant than full LTO, so it's not the best choice for release artifacts. At the same time it compiles slower than -Ccodegen-units=N which is usually optimized enough for the "debug-opt" use case.

michaelwoerister on 1 Mar 2018

@michaelwoerister it's true that ThinLTO + multiple codegen units for a crate isn't always as fast as one codegen unit for that crate, but I think we're really undervaluing the compile time wins from multiple codegen units and overvaluing microbenchmarks and a few percent performance differences.

For example I feel like multiple codegen units + ThinLTO is a fantastic default for rustc. Builds feel much faster nowadays on multicore machines as it's not sitting with one CPU pegged for multiple minutes on librustc. Stats of perf.rust-lang.org show that ThinLTO does indeed regress performance by ~5% but in the grand scheme of things that seems like a tiny (and barely perceptible) tradeoff for the compile time wins we're seeing.

I also feel like we write off the compile times of release builds too frequently as "oh it's fine if it takes a long time to build". In reality though I feel like release builds happen way more often than we necessarily perceive. For example all Rust CI builds are in release mode (as we publish artifacts) and we can't really get around that. I'd imagine that there are other projects where you're testing an optimized binary pretty frequently but still need to make changes.

alexcrichton on 1 Mar 2018

Tagging with T-compiler (something I was tempted to do last week but I'm not sure why I balked at that time).