Long compilation time (almost infinity on real project) with every nightly versions after 2017-03-18.
I tried this:
git clone https://github.com/not-fl3/incremental_bug
CARGO_INCREMENTAL=1 cargo build --release
I expected to see this happen:
On 2017-03-18 it compiles for less than a second (in 0.59 secs on my machine)
Instead, this happened:
With every version newer than 2017-03-18 it compiles for 389+ seconds on my machine.
rustc --version --verbose:
rustc 1.19.0-nightly (06fb4d256 2017-04-30)
binary: rustc
commit-hash: 06fb4d25642a3f223db1441972dd5962085cfba1
commit-date: 2017-04-30
host: x86_64-unknown-linux-gnu
release: 1.19.0-nightly
LLVM version: 4.0
Also same behaviour on this versions:
nightly-2017-03-28-x86_64-unknown-linux-gnu
nightly-2017-04-22-x86_64-unknown-linux-gnu
nightly-2017-04-30-x86_64-unknown-linux-gnu
Backtrace:
log of -Z time-asses: https://gist.github.com/not-fl3/8c4ee411fe6c23d4b4abb9a3b6a263d0
Not reproducible for me on macOS.
$ time RUST_INCREMENTAL=1 cargo build --release
Compiling incremental_bug v0.0.1 (file:///Users/kennytm/Downloads/incremental_bug)
warning: field is never used: `name`
--> src/main.rs:9:5
|
9 | name : String,
| ^^^^^^^^^^^^^
|
= note: #[warn(dead_code)] on by default
warning: field is never used: `next`
--> src/main.rs:11:5
|
11 | next : U
| ^^^^^^^^
|
= note: #[warn(dead_code)] on by default
Finished release [optimized] target(s) in 1.12 secs
real 0m1.445s
user 0m0.417s
sys 0m0.297s
$ rustc -vV
rustc 1.19.0-nightly (777ee2079 2017-05-01)
binary: rustc
commit-hash: 777ee20796e80a31d4b7c985dd68eda2941460d6
commit-date: 2017-05-01
host: x86_64-apple-darwin
release: 1.19.0-nightly
LLVM version: 4.0
Are you sure nightly-2017-03-18 is fast while nightly-2017-03-19 is slow?
@kennytm versions I tried: 2017-03-28, 2017-04-22 and 2017-04-30.
$ cargo build --release
Compiling incremental_bug v0.0.1 (file:///home/fl3/fun/outrage_test/incremental_bug)
warning: field is never used: `name`
--> src/main.rs:9:5
|
9 | name : String,
| ^^^^^^^^^^^^^
|
= note: #[warn(dead_code)] on by default
warning: field is never used: `next`
--> src/main.rs:11:5
|
11 | next : U
| ^^^^^^^^
|
= note: #[warn(dead_code)] on by default
Finished release [optimized] target(s) in 443.30 secs
rustc -vV
rustc 1.19.0-nightly (06fb4d256 2017-04-30)
binary: rustc
commit-hash: 06fb4d25642a3f223db1441972dd5962085cfba1
commit-date: 2017-04-30
host: x86_64-unknown-linux-gnu
release: 1.19.0-nightly
LLVM version: 4.0
Also tried on different linux boxes - same problem.
@not-fl3 Could you specifically test with 2017-03-18 and 2017-03-19? You may install it from rustup using rustup default nightly-2017-03-18. Just wanna confirm if it is really 2017-03-19 causing the problem for you.
No repro on my Debian VM either.
$ cargo clean
$ time RUST_INCREMENTAL=1 cargo build --release
Compiling incremental_bug v0.0.1 (file:///root/incremental_bug)
warning: field is never used: `name`
--> src/main.rs:9:5
|
9 | name : String,
| ^^^^^^^^^^^^^
|
= note: #[warn(dead_code)] on by default
warning: field is never used: `next`
--> src/main.rs:11:5
|
11 | next : U,
| ^^^^^^^^
|
= note: #[warn(dead_code)] on by default
Finished release [optimized] target(s) in 0.51 secs
real 0m0.613s
user 0m0.532s
sys 0m0.052s
$ rustc -vV
rustc 1.19.0-nightly (06fb4d256 2017-04-30)
binary: rustc
commit-hash: 06fb4d25642a3f223db1441972dd5962085cfba1
commit-date: 2017-04-30
host: x86_64-unknown-linux-gnu
release: 1.19.0-nightly
LLVM version: 4.0
@kennytm just tried continuously:
rustup override set nightly-2017-03-19
rustup override set nightly-2017-03-20
rustup override set nightly-2017-03-21
And first broken version is 2017-03-21.
rustc 1.17.0-nightly (134c4a0f0 2017-03-20)
binary: rustc
commit-hash: 134c4a0f08a3d1f55ea8968fbe728fa935c71698
commit-date: 2017-03-20
host: x86_64-unknown-linux-gnu
release: 1.17.0-nightly
LLVM version: 3.9
@kennytm hmm it seems that on that 03-21 it hangs forever. It works for 20+ minutes and keep working slowly increasing consumed ram.
The commit for 2017-03-21 is 1.17.0-nightly (134c4a0f0 2017-03-20)
The commit for 2017-03-20 is 1.17.0-nightly (6eb9960d3 2017-03-19)
PRs between the two are
$ git log --oneline --merges 6eb9960d3...134c4a0f0
134c4a0f08 Auto merge of #39628 - arielb1:shimmir, r=eddyb
244f893ed7 Auto merge of #40659 - frewsxcv:rollup, r=frewsxcv
6738cd4d47 Auto merge of #40281 - jimmycuadra:try-from-from-str, r=aturon
d49f86901c Rollup merge of #40587 - GuillaumeGomez:rustdoc-const-display, r=frewsxcv
de724ba9a6 Rollup merge of #40581 - TimNN:di-global-40, r=alexcrichton
4e9033124b Rollup merge of #40566 - clarcharr:never_error, r=sfackler
dedf9d3593 Rollup merge of #40554 - nrc:rls-data, r=alexcrichton
7471d9793c Rollup merge of #40532 - jseyfried:improve_tokenstream_quoter, r=nrc
1d1543d566 Rollup merge of #40521 - TimNN:panic-free-shift, r=alexcrichton
6480a00222 Rollup merge of #40398 - eddyb:struct-hint, r=nikomatsakis
573e906594 Rollup merge of #40281 - jimmycuadra:try-from-from-str, r=aturon
e5221f9397 Rollup merge of #40241 - Sawyer47:fix-39997, r=alexcrichton
Could only reproduce with
cargo +nightly-2017-03-21 rustc --release -- -Zincremental=.
The cause, is of course, the incremental compilation.
@kennytm
except RUSTFLAGS typed in this gist I had CARGO_INCREMENTAL=1 in my env.
https://gist.github.com/not-fl3/832b85ec9ff08537e479bb42ed4e2402
@kennytm checked your log from debian and found that in original post I used RUST_INCREMENTAL instead of CARGO_INCREMENTAL=1.
My bad, fixed original post now.
Reduced test case:
// rustc -Copt-level=2 -Zincremental=. main.rs
struct DS<U> {
name: String,
next: U,
}
fn add<U>(ds: DS<U>, name: String) -> DS<DS<U>> {
DS {
name: "?".to_owned(),
next: ds,
}
}
fn main() {
let deserializers = DS { name: "?".to_owned(), next: () };
let deserializers = add(deserializers, "?".to_owned());
let deserializers = add(deserializers, "?".to_owned());
let deserializers = add(deserializers, "?".to_owned());
let deserializers = add(deserializers, "?".to_owned());
let deserializers = add(deserializers, "?".to_owned());
let deserializers = add(deserializers, "?".to_owned());
let deserializers = add(deserializers, "?".to_owned()); // 0.7s
let deserializers = add(deserializers, "?".to_owned()); // 1.3s
let deserializers = add(deserializers, "?".to_owned()); // 2.4s
let deserializers = add(deserializers, "?".to_owned()); // 6.7s
// let deserializers = add(deserializers, "?".to_owned()); // 26.0s
// let deserializers = add(deserializers, "?".to_owned()); // 114.0s
}
Looks like some exponential blow-up.
In the 26s case, most time (51%) is spent on the RegisterCoalescer pass.
Looks like the same problem:
fn main() {
let t: std::result::Result<(), ()> = Ok(());
let f = t
.into_future()
.and_then(|_| future::ok(()))
.and_then(|_| future::ok(()))
.and_then(|_| future::ok(()))
.and_then(|_| future::ok(()))
.and_then(|_| future::ok(())) // 0.69 secs
.and_then(|_| future::ok(()))
.and_then(|_| future::ok(()))
.and_then(|_| future::ok(()))
.and_then(|_| future::ok(())) // 1.94 secs
.and_then(|_| future::ok(()))
.and_then(|_| future::ok(()))
.and_then(|_| future::ok(()))
.and_then(|_| future::ok(())) // 20.68 secs
.and_then(|_| future::ok(())) // 40.58 secs
.and_then(|_| future::ok(())) // 80.70 secs
.and_then(|_| future::ok(())) // 160.59 secs
.and_then(|_| future::ok(())) // 321.88 secs
.and_then(|_| future::ok(())); // 642.42 secs
f.wait();
}
run: cargo聽run聽--bin=test_bin
(future - from https://crates.io/crates/futures)
rustc -vV
rustc 1.18.0-nightly (94e884b63 2017-04-27)
binary: rustc
commit-hash: 94e884b6307a59f1e6e64aa7ebc1996b651a7629
commit-date: 2017-04-27
host: x86_64-pc-windows-msvc
release: 1.18.0-nightly
LLVM version: 4.0
@kgv Not really the same issue. OP's code is slow to compile due to an LLVM pass, your code is slow to compile due to MIR optimization (specifically the "translation item collection" pass).
Adding -Z no-landing-pads makes things fast again. It looks like the problem is that LLVM somehow manages to cause an exponential # of landing pads while inlining.
cc @michaelwoerister
Presumably this problem could also be triggered by using a larger number of codegen-units (instead of incremental), right?
I don't quite understand how incremental compilation or the number of codegen units could interact with landing pad generation.
Yeah
No codegen-units needed:
// rustc -Copt-level=2 main.rs
extern "C" {
static MAYBE: bool;
}
struct MayUnwind;
impl Drop for MayUnwind {
fn drop(&mut self) {
if unsafe { MAYBE } {
panic!()
}
}
}
struct DS<U> {
may_unwind: MayUnwind,
name: String,
next: U,
}
fn add<U>(ds: DS<U>, name: String) -> DS<DS<U>> {
DS {
may_unwind: MayUnwind,
name: "?".to_owned(),
next: ds,
}
}
fn main() {
let deserializers = DS { may_unwind: MayUnwind, name: "?".to_owned(), next: () };
let deserializers = add(deserializers, "?".to_owned());
let deserializers = add(deserializers, "?".to_owned());
let deserializers = add(deserializers, "?".to_owned());
let deserializers = add(deserializers, "?".to_owned());
let deserializers = add(deserializers, "?".to_owned());
let deserializers = add(deserializers, "?".to_owned());
let deserializers = add(deserializers, "?".to_owned()); // 0.7s
let deserializers = add(deserializers, "?".to_owned()); // 1.3s
let deserializers = add(deserializers, "?".to_owned()); // 2.4s
let deserializers = add(deserializers, "?".to_owned()); // 6.7s
// let deserializers = add(deserializers, "?".to_owned()); // 26.0s
// let deserializers = add(deserializers, "?".to_owned()); // 114.0s
}
@arielb1: Still not fixed as of rustc 1.19.0-nightly (386b0b9d3 2017-05-14). Reopen?
time it)-Zno-landing-pads still works as making things fast again.Your test case is also slow with -Zincremental=. (it is fast if not incremental). Again adding -Zno-landing-pads (-Cpanic=abort) fixes it.
let deserializers = DS { may_unwind: MayUnwind, name: "?".to_owned(), next: () };
let deserializers = add(deserializers, "?".to_owned());
let deserializers = add(deserializers, "?".to_owned());
let deserializers = add(deserializers, "?".to_owned());
let deserializers = add(deserializers, "?".to_owned()); // 0.6s
let deserializers = add(deserializers, "?".to_owned()); // 0.6s
let deserializers = add(deserializers, "?".to_owned()); // 1.0s
let deserializers = add(deserializers, "?".to_owned()); // 2.2s
let deserializers = add(deserializers, "?".to_owned()); // 8.0s
let deserializers = add(deserializers, "?".to_owned()); // 34.6s
// let deserializers = add(deserializers, "?".to_owned());
// let deserializers = add(deserializers, "?".to_owned());
// let deserializers = add(deserializers, "?".to_owned());
@kennytm
Yup the inlinehint attribute seems to be still present. Got to get rid of it.
@kennytm
Actually I used an older version. With nightly to me it is fast with -C opt-level=2, but is slow with -C opt-level=3.
gist of LLVM IR: https://gist.github.com/arielb1/502c5847c9bbd747e22f82aa8fed695c - no inline hints, but opt "optimizes" this into 60k lines of IR.
Root cause is https://bugs.llvm.org/show_bug.cgi?id=33072. This variant affects all versions of rustc including 1.8.0:
fn f1(x: &mut u32) {
let odd = (-((*x & 1) as i32)) as u32;
*x = ((*x * 3 + 1) & odd)
+ ((*x / 2) & !odd);
}
fn f2(x: &mut u32) {
f1(x);
f1(x);
f1(x);
f1(x);
}
fn f3(x: &mut u32) {
f2(x);
f2(x);
f2(x);
f2(x);
}
fn f4(x: &mut u32) {
f3(x);
f3(x);
f3(x);
f3(x);
}
fn f5(x: &mut u32) {
f4(x);
f4(x);
f4(x);
f4(x);
}
fn f6(x: &mut u32) {
f5(x);
f5(x);
f5(x);
f5(x);
}
fn f7(x: &mut u32) {
f6(x);
f6(x);
f6(x);
f6(x);
}
fn f8(x: &mut u32) {
f7(x);
f7(x);
f7(x);
f7(x);
}
fn f9(x: &mut u32) {
f8(x);
f8(x);
f8(x);
f8(x);
}
fn fA(x: &mut u32) {
f9(x);
f9(x);
f9(x);
f9(x);
}
fn main() {
let mut x = 44;
fA(&mut x);
println!("{}", x);
}
It looks like this is worked around for 1.18 right?
@brson I don't think this is resolved. Take https://github.com/jonhoo/fantoccini/commit/79e16830ee7dc502a7ba3c3fde0172825ddbc9ba for example. On current nightly (2017-06-18), cargo build is nearly instantaneous, whereas cargo test --lib --no-run takes ~720s on my laptop, presumably due to the sequence of and_then's at https://github.com/jonhoo/fantoccini/blob/79e16830ee7dc502a7ba3c3fde0172825ddbc9ba/src/lib.rs#L1317.
@arielb1 #42771 still doesn't seem to solve this for me. cargo test --lib --no-run on https://github.com/jonhoo/fantoccini/commit/0802a57bcac2c50e569dc7649f3b90ae4e8787c9 for example still takes about ten minutes to compile on the latest nightly (which includes #42771). This is presumably because of the many and_then calls in the three tests at the bottom...
EDIT: in fact, compile time seems to have gone up if anything. Used to be ~750s, now 925s...
Going to tentatively reopen for now based on @jonhoo's report.
@jonhoo
Can you give me the output of compiling your program with -Z time-passes?
time: 0.007; rss: 55MB parsing
time: 0.000; rss: 55MB recursion limit
time: 0.000; rss: 55MB crate injection
time: 0.000; rss: 55MB plugin loading
time: 0.000; rss: 55MB plugin registration
time: 0.099; rss: 112MB expansion
time: 0.003; rss: 112MB maybe building test harness
time: 0.000; rss: 112MB maybe creating a macro crate
time: 0.000; rss: 112MB checking for inline asm in case the target doesn't support it
time: 0.001; rss: 112MB early lint checks
time: 0.000; rss: 112MB AST validation
time: 0.011; rss: 120MB name resolution
time: 0.002; rss: 120MB complete gated feature checking
time: 0.004; rss: 122MB lowering ast -> hir
time: 0.001; rss: 122MB indexing hir
time: 0.000; rss: 122MB attribute checking
time: 0.000; rss: 122MB language item collection
time: 0.000; rss: 122MB lifetime resolution
time: 0.000; rss: 122MB looking for entry point
time: 0.000; rss: 122MB looking for plugin registrar
time: 0.000; rss: 122MB loop checking
time: 0.000; rss: 122MB static item recursion checking
time: 0.005; rss: 122MB compute_incremental_hashes_map
time: 0.000; rss: 122MB load_dep_graph
time: 0.000; rss: 122MB stability index
time: 0.001; rss: 122MB stability checking
time: 0.009; rss: 132MB type collecting
time: 0.000; rss: 132MB impl wf inference
time: 0.031; rss: 154MB coherence checking
time: 0.000; rss: 154MB variance testing
time: 0.006; rss: 154MB wf checking
time: 0.006; rss: 158MB item-types checking
time: 0.205; rss: 164MB item-bodies checking
time: 0.021; rss: 168MB const checking
time: 0.001; rss: 168MB privacy checking
time: 0.000; rss: 168MB intrinsic checking
time: 0.000; rss: 168MB effect checking
time: 0.003; rss: 168MB match checking
time: 0.001; rss: 168MB liveness checking
time: 0.070; rss: 174MB borrow checking
time: 0.000; rss: 174MB reachability checking
time: 0.001; rss: 174MB death checking
time: 0.000; rss: 174MB unused lib feature checking
time: 0.014; rss: 174MB lint checking
time: 0.000; rss: 174MB resolving dependency formats
time: 0.001; rss: 174MB write metadata
time: 495.029; rss: 206MB translation item collection
time: 0.012; rss: 214MB codegen unit partitioning
time: 0.007; rss: 337MB internalize symbols
time: 807.783; rss: 337MB translation
time: 0.000; rss: 337MB assert dep graph
time: 0.000; rss: 337MB serialize dep graph
time: 0.002; rss: 324MB codegen passes [0]
time: 0.205; rss: 326MB llvm function passes [1]
time: 0.129; rss: 330MB llvm module passes [1]
time: 4.515; rss: 336MB codegen passes [1]
time: 4.852; rss: 332MB LLVM passes
time: 0.000; rss: 332MB serialize work products
time: 0.682; rss: 245MB running linker
time: 0.685; rss: 245MB linking
time: 0.001; rss: 174MB write metadata
time: 495.029; rss: 206MB translation item collection
time: 0.012; rss: 214MB codegen unit partitioning
time: 0.007; rss: 337MB internalize symbols
time: 807.783; rss: 337MB translation
is pretty curious. the extra time is spent between when codegen unit partitioning is printed and when internalize symbols is printed, but not sure if that helps?
@arielb1: just checking that you saw my follow-up?
I was a bit busy lately, but:
time: 807.783; rss: 337MB translation
time: 4.852; rss: 332MB LLVM passes
This bug causes excessive time in LLVM passes, not in translation, so I'm pretty sure it's a different bug. Please open a separate issue.
Most helpful comment
Could only reproduce with
The cause, is of course, the incremental compilation.