Rust: Seemingly pathological behavior in typechecking, item collecting, and trans

Created on 12 May 2016 · 12Comments · Source: rust-lang/rust

In working on a prototype futures library I've played around a bit with a small example program. Unfortunately this 20-line function has a 12 second compile time in debug mode. Looking at -Z time-passes, the worst offenders are:

$ rustup run nightly cargo rustc --example retry -- -Z time-passes
...
time: 2.266; rss: 87MB  item-bodies checking
...
time: 0.000; rss: 95MB  Prepare MIR codegen passes
  time: 3.745; rss: 106MB       translation item collection                                             
  time: 0.000; rss: 124MB       write metadata                                                          
time: 9.259; rss: 124MB translation                                                                     
time: 0.000; rss: 124MB assert dep graph
time: 0.000; rss: 124MB serialize dep graph
  time: 0.021; rss: 73MB        llvm function passes [0]
  time: 0.005; rss: 74MB        llvm module passes [0]  
  time: 0.365; rss: 79MB        codegen passes [0]      
  time: 0.000; rss: 71MB        codegen passes [0]      
time: 0.402; rss: 71MB  LLVM passes                     
  time: 0.042; rss: 71MB        running linker
time: 0.043; rss: 71MB  linking

Some interesting data points

Item-bodies checking is taking two seconds? (cc @nikomatsakis)
Translation item collection it taking four seconds. (cc @michaelwoerister)
Trans itself is taking around six seconds (cc @arielb1, @dotdash). I didn't _think_ this is monomorphizing massive amounts of code, at least not 6 seconds worth of trans of code.

FWIW, the perf record of this compile looks like the hottest functions are (those above 1%):

 23.53%     rustc  librustc-031d5509.so                  [.] _$LT$ty..subst..Substs$LT$$u27$tcx$GT$$u20$as$u20$core..hash..Hash$GT$::hash::h67025cda0fa7ab1b                                                
 10.49%     rustc  librustc-031d5509.so                  [.] _$LT$collections..vec..Vec$LT$T$GT$$u20$as$u20$core..hash..Hash$GT$::hash::ha06e4f3f77f47962                                                   
  4.92%     rustc  librustc-031d5509.so                  [.] _$LT$std..collections..HashMap$LT$K$C$$u20$V$C$$u20$S$GT$$GT$::get::h97a8507e2c811a37                                                          
  4.88%     rustc  librustc-031d5509.so                  [.] _$LT$std..collections..HashMap$LT$K$C$$u20$V$C$$u20$S$GT$$GT$::get::hfbcbea9f0e6ca77f                                                          
  2.79%     rustc  libstd-031d5509.so                    [.] mallocx                                                                                                                                        
  2.73%     rustc  libstd-031d5509.so                    [.] sdallocx                                                                                                                                       
  2.43%     rustc  librustc-031d5509.so                  [.] _$LT$ty..sty..TypeVariants$LT$$u27$tcx$GT$$u20$as$u20$core..hash..Hash$GT$::hash::h13defa770fe420c4                                            
  2.16%     rustc  librustc-031d5509.so                  [.] rustc::ty::fold::TypeFolder::fold_ty::hc4acd00a49a1567e                                                                                        
  1.71%     rustc  librustc-031d5509.so                  [.] _$LT$ty..sty..TypeVariants$LT$$u27$tcx$GT$$u20$as$u20$core..cmp..PartialEq$GT$::eq::he128d17c0cf9651e                                          
  1.57%     rustc  librustc-031d5509.so                  [.] rustc::ty::context::TyCtxt::intern_ty::h30fe821a54c8bb56                                                                                       
  1.53%     rustc  libc-2.19.so                          [.] __memmove_ssse3_back                                                                                                                           
  1.24%     rustc  librustc-031d5509.so                  [.] rustc::ty::context::TyCtxt::mk_substs::hf9ef6f493c09a1fb                                                                                       
  1.19%     rustc  librustc-031d5509.so                  [.] _$LT$ty..sty..Binder$LT$T$GT$$u20$as$u20$core..hash..Hash$GT$::hash::h1d6271393e0c767b                                                         
  1.05%     rustc  librustc-031d5509.so                  [.] traits..ObligationCause$LT$$u27$static$GT$::drop.80363::hae63a8e62161292a                                                                      
  1.04%     rustc  librustc-031d5509.so                  [.] _$LT$infer..resolve..OpportunisticTypeResolver$LT$$u27$a$C$$u20$$u27$tcx$GT$$u20$as$u20$ty..fold..TypeFolder$LT$$u27$tcx$GT$$GT$::fold_ty::hffa

We seem to be hashing... a lot!

Source

alexcrichton

Most helpful comment

So for reference I tried Alex's future stuff with my projection cache and it made type-checking basically free, as well as lowering the cost of everything else substantially -- but trans and friends remained fairly expensive. I don't have the measurements anymore unfortunately. I hope to be landing the projection cache soon-ish, have to close up the unsoundness that the cache revealed first.

nikomatsakis on 16 May 2016

❤3

All 12 comments

Yes, that definitely looks excessive. Thanks for the report!

michaelwoerister on 12 May 2016

Reduced test case:

extern crate futures;

use futures::*;

fn main() {
    let a: Box<Future<Item=i32, Error=()>> = loop {};
    a
        .map(|t| t)
        .map(|t| t)
        .map(|t| t)
        .map(|t| t)
        .map(|t| t)
        .map(|t| t)
        .map(|t| t)
        .map(|t| t)
        .map(|t| t)
        .map(|t| t)
        .map(|t| t)
        .map(|t| t)
        .map(|t| t)
        .map(|t| t);
}

dotdash on 12 May 2016

I believe this is the same issue as in #22204 and #31849. @nikomatsakis did an attempt at adding a cache for projections in #32287 but there were some issues with higher ranked lifetimes.

Marwes on 12 May 2016

@dotdash that test case seems to stress the pathological performance in "item-bodies checking", but it didn't seem to reproduce the massive amount of time in trans (particularly the item collector). Perhaps that's same as #22204 which @Marwes points out, but there may be something else lurking?

@Marwes yeah I'd believe that the typechecking performance cliff is related to that, but that's only about 15% of the original test's compile time, the remaining parts were all in trans.

alexcrichton on 12 May 2016

@alexcrichton I _think_ that the difference is just that trans hits the bad code path less often with this variant, not that there's actually a problem in trans. The profile you provided kind of supports that idea as well.

dotdash on 12 May 2016

@alexcrichton Hmm, I think I got the times of the chained iterator example mixed up with the times I get for compiling the parser for embed_lang which definitely is slow due to the lack of cached projection types.

...
time: 9.079; rss: 101MB item-bodies checking
...
time: 0.009; rss: 108MB Prepare MIR codegen passes
  time: 198.739; rss: 157MB     translation item collection
  time: 0.129; rss: 324MB       write metadata
time: 636.004; rss: 315MB       translation
time: 0.000; rss: 315MB assert dep graph
time: 0.000; rss: 315MB serialize dep graph
  time: 0.601; rss: 198MB       llvm function passes [0]
  time: 0.197; rss: 200MB       llvm module passes [0]
  time: 13.754; rss: 199MB      codegen passes [0]
  time: 0.008; rss: 76MB        codegen passes [0]
time: 15.669; rss: 73MB LLVM passes
time: 0.633; rss: 75MB  linking

Marwes on 12 May 2016

time: 198.739; rss: 157MB translation item collection

wat!
Is this for @dotdash's test case?

michaelwoerister on 12 May 2016

@michaelwoerister Sorry I wasn't clear enough! That was for compiling embed_lang's parser (cargo rustc --features test -- -Z time-passes to be precise). To be fair, all times are actually doubled as passing --features test creates two concrete parsers (I acidentally passed that parameter out of habit).

Marwes on 12 May 2016

👍1

nikomatsakis on 16 May 2016

❤3

@nikomatsakis - if you want I can also try it in the crates timing thing I was working on with Alex.

jonathandturner on 16 May 2016

Note that the projection cache landed. So type-checking should be faster, if not the backend. More work is needed there -- this may be because the projection cache is tied to an inference context, and we make a fresh one each time in trans, but there may be other factors.

nikomatsakis on 10 Jun 2016

I think this is the same issue as https://github.com/rust-lang/rust/issues/38528, so closing in favor of that.

Mark-Simulacrum on 23 Jun 2017

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Comments for macro expansions

Robbepop · 3Comments

#![allow(unknown_lints)] does not work if there is a unknow lint at the same level since nightly-2017-08-11-x86_64-unknown-linux-gnu

overvenus · 3Comments

detect when "unconstrained type parameters" could be provided explicitly to a fn call

nikomatsakis · 3Comments

Lifetime elision is too greedy without explicit type declaration

dnsl48 · 3Comments

Internal compiler error: cannot relate bound region (likely caused by `conservative_impl_trait`)

jmegaffin · 3Comments