In working on a prototype futures library I've played around a bit with a small example program. Unfortunately this 20-line function has a 12 second compile time in debug mode. Looking at -Z time-passes, the worst offenders are:
$ rustup run nightly cargo rustc --example retry -- -Z time-passes
...
time: 2.266; rss: 87MB item-bodies checking
...
time: 0.000; rss: 95MB Prepare MIR codegen passes
time: 3.745; rss: 106MB translation item collection
time: 0.000; rss: 124MB write metadata
time: 9.259; rss: 124MB translation
time: 0.000; rss: 124MB assert dep graph
time: 0.000; rss: 124MB serialize dep graph
time: 0.021; rss: 73MB llvm function passes [0]
time: 0.005; rss: 74MB llvm module passes [0]
time: 0.365; rss: 79MB codegen passes [0]
time: 0.000; rss: 71MB codegen passes [0]
time: 0.402; rss: 71MB LLVM passes
time: 0.042; rss: 71MB running linker
time: 0.043; rss: 71MB linking
Some interesting data points
FWIW, the perf record of this compile looks like the hottest functions are (those above 1%):
23.53% rustc librustc-031d5509.so [.] _$LT$ty..subst..Substs$LT$$u27$tcx$GT$$u20$as$u20$core..hash..Hash$GT$::hash::h67025cda0fa7ab1b
10.49% rustc librustc-031d5509.so [.] _$LT$collections..vec..Vec$LT$T$GT$$u20$as$u20$core..hash..Hash$GT$::hash::ha06e4f3f77f47962
4.92% rustc librustc-031d5509.so [.] _$LT$std..collections..HashMap$LT$K$C$$u20$V$C$$u20$S$GT$$GT$::get::h97a8507e2c811a37
4.88% rustc librustc-031d5509.so [.] _$LT$std..collections..HashMap$LT$K$C$$u20$V$C$$u20$S$GT$$GT$::get::hfbcbea9f0e6ca77f
2.79% rustc libstd-031d5509.so [.] mallocx
2.73% rustc libstd-031d5509.so [.] sdallocx
2.43% rustc librustc-031d5509.so [.] _$LT$ty..sty..TypeVariants$LT$$u27$tcx$GT$$u20$as$u20$core..hash..Hash$GT$::hash::h13defa770fe420c4
2.16% rustc librustc-031d5509.so [.] rustc::ty::fold::TypeFolder::fold_ty::hc4acd00a49a1567e
1.71% rustc librustc-031d5509.so [.] _$LT$ty..sty..TypeVariants$LT$$u27$tcx$GT$$u20$as$u20$core..cmp..PartialEq$GT$::eq::he128d17c0cf9651e
1.57% rustc librustc-031d5509.so [.] rustc::ty::context::TyCtxt::intern_ty::h30fe821a54c8bb56
1.53% rustc libc-2.19.so [.] __memmove_ssse3_back
1.24% rustc librustc-031d5509.so [.] rustc::ty::context::TyCtxt::mk_substs::hf9ef6f493c09a1fb
1.19% rustc librustc-031d5509.so [.] _$LT$ty..sty..Binder$LT$T$GT$$u20$as$u20$core..hash..Hash$GT$::hash::h1d6271393e0c767b
1.05% rustc librustc-031d5509.so [.] traits..ObligationCause$LT$$u27$static$GT$::drop.80363::hae63a8e62161292a
1.04% rustc librustc-031d5509.so [.] _$LT$infer..resolve..OpportunisticTypeResolver$LT$$u27$a$C$$u20$$u27$tcx$GT$$u20$as$u20$ty..fold..TypeFolder$LT$$u27$tcx$GT$$GT$::fold_ty::hffa
We seem to be hashing... a lot!
Yes, that definitely looks excessive. Thanks for the report!
Reduced test case:
extern crate futures;
use futures::*;
fn main() {
let a: Box<Future<Item=i32, Error=()>> = loop {};
a
.map(|t| t)
.map(|t| t)
.map(|t| t)
.map(|t| t)
.map(|t| t)
.map(|t| t)
.map(|t| t)
.map(|t| t)
.map(|t| t)
.map(|t| t)
.map(|t| t)
.map(|t| t)
.map(|t| t)
.map(|t| t);
}
I believe this is the same issue as in #22204 and #31849. @nikomatsakis did an attempt at adding a cache for projections in #32287 but there were some issues with higher ranked lifetimes.
@dotdash that test case seems to stress the pathological performance in "item-bodies checking", but it didn't seem to reproduce the massive amount of time in trans (particularly the item collector). Perhaps that's same as #22204 which @Marwes points out, but there may be something else lurking?
@Marwes yeah I'd believe that the typechecking performance cliff is related to that, but that's only about 15% of the original test's compile time, the remaining parts were all in trans.
@alexcrichton I _think_ that the difference is just that trans hits the bad code path less often with this variant, not that there's actually a problem in trans. The profile you provided kind of supports that idea as well.
@alexcrichton Hmm, I think I got the times of the chained iterator example mixed up with the times I get for compiling the parser for embed_lang which definitely is slow due to the lack of cached projection types.
...
time: 9.079; rss: 101MB item-bodies checking
...
time: 0.009; rss: 108MB Prepare MIR codegen passes
time: 198.739; rss: 157MB translation item collection
time: 0.129; rss: 324MB write metadata
time: 636.004; rss: 315MB translation
time: 0.000; rss: 315MB assert dep graph
time: 0.000; rss: 315MB serialize dep graph
time: 0.601; rss: 198MB llvm function passes [0]
time: 0.197; rss: 200MB llvm module passes [0]
time: 13.754; rss: 199MB codegen passes [0]
time: 0.008; rss: 76MB codegen passes [0]
time: 15.669; rss: 73MB LLVM passes
time: 0.633; rss: 75MB linking
time: 198.739; rss: 157MB translation item collection
wat!
Is this for @dotdash's test case?
@michaelwoerister Sorry I wasn't clear enough! That was for compiling embed_lang's parser (cargo rustc --features test -- -Z time-passes to be precise). To be fair, all times are actually doubled as passing --features test creates two concrete parsers (I acidentally passed that parameter out of habit).
So for reference I tried Alex's future stuff with my projection cache and it made type-checking basically free, as well as lowering the cost of everything else substantially -- but trans and friends remained fairly expensive. I don't have the measurements anymore unfortunately. I hope to be landing the projection cache soon-ish, have to close up the unsoundness that the cache revealed first.
@nikomatsakis - if you want I can also try it in the crates timing thing I was working on with Alex.
Note that the projection cache landed. So type-checking should be faster, if not the backend. More work is needed there -- this may be because the projection cache is tied to an inference context, and we make a fresh one each time in trans, but there may be other factors.
I think this is the same issue as https://github.com/rust-lang/rust/issues/38528, so closing in favor of that.
Most helpful comment
So for reference I tried Alex's future stuff with my projection cache and it made type-checking basically free, as well as lowering the cost of everything else substantially -- but trans and friends remained fairly expensive. I don't have the measurements anymore unfortunately. I hope to be landing the projection cache soon-ish, have to close up the unsoundness that the cache revealed first.