Rust: Nonlinear compile time blow-up with deeply nested types

Created on 22 Dec 2016 · 40Comments · Source: rust-lang/rust

Compiling the postgres-tokio crate at https://github.com/sfackler/rust-postgres/commit/d27518ba76d76ccaa59b3ccd63e981bd8bd0ef33 goes from 5 seconds to 45 seconds on nightly if the two .boxed() calls in the middle of this call chain are removed: https://github.com/sfackler/rust-postgres/blob/d27518ba76d76ccaa59b3ccd63e981bd8bd0ef33/postgres-tokio/src/lib.rs#L342-L408.

Looks like 15 seconds is spent in translation item collection, and 39 seconds is spent in translation:

time: 0.004 parsing
time: 0.005 recursion limit
time: 0.000 crate injection
time: 0.001 plugin loading
time: 0.000 plugin registration
time: 0.238 expansion
time: 0.000 maybe building test harness
time: 0.000 maybe creating a macro crate
time: 0.000 checking for inline asm in case the target doesn't support it
time: 0.000 complete gated feature checking
time: 0.008 early lint checks
time: 0.000 AST validation
time: 0.005 name resolution
time: 0.004 lowering ast -> hir
time: 0.002 indexing hir
time: 0.000 attribute checking
time: 0.001 language item collection
time: 0.001 lifetime resolution
time: 0.000 looking for entry point
time: 0.000 looking for plugin registrar
time: 0.001 region resolution
time: 0.000 loop checking
time: 0.000 static item recursion checking
time: 0.005 compute_incremental_hashes_map
time: 0.000 load_dep_graph
time: 0.000 stability index
time: 0.001 stability checking
time: 0.011 type collecting
time: 0.000 variance inference
time: 0.000 impl wf inference
time: 0.021 coherence checking
time: 0.008 wf checking
time: 0.003 item-types checking
time: 0.500 item-bodies checking
time: 0.000 drop-impl checking
time: 0.003 const checking
time: 0.001 privacy checking
time: 0.000 intrinsic checking
time: 0.000 effect checking
time: 0.001 match checking
time: 0.000 liveness checking
time: 0.002 rvalue checking
time: 0.014 MIR dump
  time: 0.001   SimplifyCfg
  time: 0.002   QualifyAndPromoteConstants
  time: 0.002   TypeckMir
  time: 0.000   SimplifyBranches
  time: 0.001   SimplifyCfg
time: 0.006 MIR cleanup and validation
time: 0.006 borrow checking
time: 0.000 reachability checking
time: 0.000 death checking
time: 0.000 unused lib feature checking
time: 1.366 lint checking
time: 0.000 resolving dependency formats
  time: 0.000   NoLandingPads
  time: 0.001   SimplifyCfg
  time: 0.001   EraseRegions
  time: 0.000   AddCallGuards
  time: 0.008   ElaborateDrops
  time: 0.000   NoLandingPads
  time: 0.001   SimplifyCfg
  time: 0.001   InstCombine
  time: 0.000   Deaggregator
  time: 0.000   CopyPropagation
  time: 0.001   SimplifyLocals
  time: 0.000   AddCallGuards
  time: 0.000   PreTrans
time: 0.015 MIR optimisations
  time: 0.001   write metadata
  time: 15.086  translation item collection
  time: 0.020   codegen unit partitioning
  time: 0.022   internalize symbols
time: 39.376    translation
time: 0.000 assert dep graph
time: 0.000 serialize dep graph
  time: 0.089   llvm function passes [0]
  time: 0.037   llvm module passes [0]
  time: 2.148   codegen passes [0]
  time: 0.001   codegen passes [0]
time: 2.432 LLVM passes
time: 0.000 serialize work products
time: 0.084 linking

Things are significantly worse on 1.13 - 2 minutes in translation!

Some discussion in IRC: https://botbot.me/mozilla/rust-internals/2016-12-22/?msg=78294648&page=1

cc @aturon

UPDATE: #40280 was closed as a duplicate of this. It had the following sample code:

let &(first, B, C, D, E, F, G, H) = self;
let iter = first.buffers_list();
let iter = iter.chain(B.buffers_list());
let iter = iter.chain(C.buffers_list());
let iter = iter.chain(D.buffers_list());
let iter = iter.chain(E.buffers_list());
let iter = iter.chain(F.buffers_list());
let iter = iter.chain(G.buffers_list());
let iter = iter.chain(H.buffers_list());
Box::new(iter)

--nmatsakis

C-bug I-compiletime P-medium T-compiler

Source

sfackler

👍1

Most helpful comment

Another update: the projection cache actually exists, but we're discarding them every time by wiping the InferCtxt (with snapshots). I have created a diagram to show how this recurs:
Diagram

So basically we need to implement this FIXME: https://github.com/rust-lang/rust/blob/554fe71ac6172ce9c0baeabbfd692280dcc893fe/src/librustc/traits/project.rs#L1548-L1550

ishitatsuyuki on 17 Feb 2018

❤11

All 40 comments

Things are significantly worse on 1.13 - 2 minutes in translation!

This may be the effect of the projection cache.

nikomatsakis on 22 Dec 2016

cc @alexcrichton This is obviously a show-stopper for futures.

aturon on 22 Dec 2016

I can do some profiling. I've had some plans for improving collection/trans that I think may be related. One question to try and answer is what %age of this is just "we are making more code" vs "we are wasting time doing things in trait selection that could be cached". I have observed the latter from time to time and had some thoughts on how to fix it.

nikomatsakis on 22 Dec 2016

I've also seen this before, with tokio-socks5 as well. Removing just a handful of the trait objects in that file makes the compile time of the crate shoot from 2.34s to 89.52s (!!)

alexcrichton on 25 Dec 2016

Here's a simple example of something that takes a very long time to compile:

future::ok::<(),()>(()).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(()));

dwrensha on 20 Jan 2017

😄10

Closed #40280 as a duplicate, moved some example code into the issue header.

nikomatsakis on 9 Mar 2017

Just a stupid question ‒ the „translation item collection“, which is one of the two culprits here, is listed in the „MIR optimisations“. Does it really have to happen at all on non-optimised debug build?

vorner on 18 Mar 2017

It's natural to read it that way, but the headers are actual after their group, so it's part of translation

bluss on 18 Mar 2017

This seems to be hard enough to take some time fixing. So I thought of a workaround, if someone is also interested. It uses the trick with placing trait objects to split the chains of modifiers, but only on testing/debug builds where the compilation speed matters, while it keeps the complex but hopefully faster concrete types in release build:

This one is for streams (that's what I needed), but can obviously work for futures or other things as well:

#[cfg(debug_assertions)]
fn test_boxed<T, E, S>(s: S) -> Box<Stream<Item = T, Error = E>>
    where S: Stream<Item = T, Error = E> + 'static
{
    Box::new(s)
}
#[cfg(not(debug_assertions))]
fn test_boxed<S>(s: S) -> S {
    s
}

(I didn't find any better config option than the debug_assertions, but I hope that one is good enough)

vorner on 27 Mar 2017

👍1

Nominating in place of https://github.com/rust-lang/rust/issues/42941. cc @eddyb (nominator)

Mark-Simulacrum on 2 Jul 2017

Repeating another bad case from #42941:

extern crate futures;
use futures::{future, IntoFuture, Future};
fn main() {
    let t: std::result::Result<(), ()> = Ok(());
    let f = t
            .into_future()
            .and_then(|_| future::ok(()))
            .and_then(|_| future::ok(()))
            .and_then(|_| future::ok(()))
            .and_then(|_| future::ok(()))
            .and_then(|_| future::ok(()))
            .and_then(|_| future::ok(()))
            .and_then(|_| future::ok(()))
            .and_then(|_| future::ok(()))
            .and_then(|_| future::ok(()))
            .and_then(|_| future::ok(()))
            .and_then(|_| future::ok(()))
            .and_then(|_| future::ok(()))
            .and_then(|_| future::ok(()))
            .and_then(|_| future::ok(()))
            .and_then(|_| future::ok(()))
            .and_then(|_| future::ok(()))
            .and_then(|_| future::ok(()))
            .and_then(|_| future::ok(()));
    f.wait();
}

The code above takes ~750s to compile on my laptop (you can make it shorter/longer by removing/adding .and_then calls):

...
  time: 0.000; rss: 108MB       write metadata
  time: 472.840; rss: 110MB     translation item collection
  time: 0.001; rss: 112MB       codegen unit partitioning
  time: 0.001; rss: 133MB       internalize symbols
time: 732.741; rss: 133MB       translation
...

The extra ~250s is spent between codegen unit partitioning and internalize symbols.

jonhoo on 2 Jul 2017

👍3

Another observation from #42941 about why this is becoming even more important is that the newly-released hyper 0.11 uses futures everywhere, which will likely cause many more programs to use chains of futures like this going forward.

jonhoo on 6 Jul 2017

triage: P-high

We are raising to "high priority" to at least do some investigation and try to determine whether revised trait solving strategies will be of use here.

nikomatsakis on 6 Jul 2017

status: waiting for niko

arielb1 on 10 Aug 2017

Bump. This really hurts impl Trait (boxing hides this issue).

ishitatsuyuki on 31 Aug 2017

I'm not sure if this is fixed... I'm on nightly, version 1.23.0-nightly on Arch Linux...

I get the following in -Ztime-passes:

  time: 0.001; rss: 57MB    parsing
  time: 0.000; rss: 57MB    recursion limit
  time: 0.000; rss: 57MB    crate injection
  time: 0.000; rss: 57MB    plugin loading
  time: 0.000; rss: 57MB    plugin registration
  time: 0.054; rss: 122MB   expansion
  time: 0.000; rss: 122MB   maybe building test harness
  time: 0.000; rss: 122MB   maybe creating a macro crate
  time: 0.000; rss: 122MB   creating allocators
  time: 0.000; rss: 122MB   checking for inline asm in case the target doesn't support it
  time: 0.000; rss: 122MB   AST validation
  time: 0.006; rss: 128MB   name resolution
  time: 0.000; rss: 128MB   complete gated feature checking
  time: 0.000; rss: 128MB   lowering ast -> hir
  time: 0.000; rss: 128MB   early lint checks
  time: 0.001; rss: 128MB   indexing hir
  time: 0.000; rss: 128MB   attribute checking
  time: 0.000; rss: 128MB   lifetime resolution
  time: 0.000; rss: 128MB   looking for entry point
  time: 0.000; rss: 128MB   looking for plugin registrar
  time: 0.000; rss: 128MB   loop checking
  time: 0.000; rss: 128MB   static item recursion checking
  time: 0.000; rss: 132MB   stability checking
  time: 0.001; rss: 135MB   type collecting
  time: 0.000; rss: 135MB   outlives testing
  time: 0.000; rss: 135MB   impl wf inference
  time: 0.000; rss: 135MB   coherence checking
  time: 0.000; rss: 135MB   variance testing
  time: 0.001; rss: 139MB   wf checking
  time: 0.000; rss: 139MB   item-types checking
  time: 0.080; rss: 173MB   item-bodies checking
  time: 0.009; rss: 177MB   const checking
  time: 0.001; rss: 177MB   privacy checking
  time: 0.000; rss: 177MB   intrinsic checking
  time: 0.000; rss: 177MB   match checking
  time: 0.000; rss: 177MB   liveness checking
  time: 0.679; rss: 177MB   borrow checking
  time: 0.000; rss: 177MB   MIR borrow checking
  time: 0.000; rss: 177MB   MIR effect checking
  time: 0.000; rss: 177MB   death checking
  time: 0.000; rss: 177MB   unused lib feature checking
  time: 0.009; rss: 183MB   lint checking
  time: 0.000; rss: 183MB   resolving dependency formats
    time: 0.000; rss: 183MB write metadata

so this looks very similar. In fact, I too have code with a bunch of:

      future_maker()
           .and_then( futurey_stuff())
           .and_then( futurey_stuff())
           .and_then( futurey_stuff())
           .and_then( futurey_stuff())
           .and_then( futurey_stuff())
           .and_then( futurey_stuff())

Sorry if it's a bother... just wasn't sure where to ask and this seems to be the most recent reference.

Does it matter that some of the and_thens have move closures instead?

hemangandhi on 29 Oct 2017

Of course not. Particularly for the futures case generators will help, but @nikomatsakis doesn't seem to be working on this.

ishitatsuyuki on 29 Oct 2017

Just to double-check, that'd be std::ops::Generator?

hemangandhi on 29 Oct 2017

The doc is here and the futures helper is here.

ishitatsuyuki on 29 Oct 2017

❤1

Why would generators help? It's a trait, it doesn't do away with the deeply nested types.

Ralith on 29 Oct 2017

Generators help because they are an alternative to combinators. When you write a generator, you get a single (anonymous) type for the entire state machine; when you use combinators, you build the state machine out of nested types.

rpjohnst on 29 Oct 2017

👍1

triage: P-medium

I'm going to lower this from P-high to reflect reality. I'm still eager to investigate but haven't had time, and making it P-high is not helping =)

nikomatsakis on 2 Nov 2017

So I finally did some work here. In particular, I extracted out a self-contained example that seems to reproduce the problem, which let me do a bit of experimentation. The gist contains two variants of the example; one that models Future quite closely (though without IntoFuture), and the other which removes the Output type. The timing results are quite interesting. See if you can spot the outlier. ;)

             size
variant      4      8      12      16
full         .19s   .34s   2.5s    36.9s
no-output    .19s   .26s    .9s     9.8s

I haven't had time to dig deeper than that yet.

nikomatsakis on 3 Nov 2017

😄6

These can probably be considered even more self-contained than what @nikomatsakis have made in this post, and IMO they show the issue more consistently (reproducible exponential blowups by a factor of approximately 2 for each line of .chain(empty()) in every compilation step I specifically highlighted below) so I'll post them:

Generic version

Code

pub fn foo<T: 'static>() -> Box<Iterator<Item = [T; 17]>> {
    use std::iter::empty;

    Box::new(empty()
        .chain(empty())
        .chain(empty())
        .chain(empty())
        .chain(empty())
        .chain(empty())
        .chain(empty())
        .chain(empty())
        .chain(empty())
        .chain(empty())
        .chain(empty()) // 10th .chain(empty())
        .chain(empty())
        .chain(empty())
        .chain(empty())
        .chain(empty())
        .chain(empty())
        .chain(empty())
        .chain(empty()) // 17th .chain(empty())
    )
}

$ rustc +nightly --crate-type=lib foo_generic.rs -Z time-passes
...
  time: 34.801; rss: 92MB   item-bodies checking
...

All other timings are <0.3s.

Full output

  time: 0.001; rss: 43MB    parsing
  time: 0.000; rss: 43MB    recursion limit
  time: 0.000; rss: 43MB    crate injection
  time: 0.000; rss: 43MB    plugin loading
  time: 0.000; rss: 43MB    plugin registration
  time: 0.068; rss: 68MB    expansion
  time: 0.000; rss: 68MB    maybe building test harness
  time: 0.000; rss: 68MB    maybe creating a macro crate
  time: 0.000; rss: 68MB    creating allocators
  time: 0.000; rss: 68MB    checking for inline asm in case the target doesn't support it
  time: 0.000; rss: 68MB    AST validation
  time: 0.006; rss: 70MB    name resolution
  time: 0.000; rss: 70MB    complete gated feature checking
  time: 0.000; rss: 71MB    lowering ast -> hir
  time: 0.000; rss: 71MB    early lint checks
  time: 0.000; rss: 71MB    indexing hir
  time: 0.000; rss: 71MB    attribute checking
  time: 0.000; rss: 71MB    load query result cache
  time: 0.000; rss: 71MB    lifetime resolution
  time: 0.000; rss: 71MB    looking for entry point
  time: 0.000; rss: 71MB    looking for plugin registrar
  time: 0.000; rss: 71MB    loop checking
  time: 0.000; rss: 71MB    static item recursion checking
  time: 0.000; rss: 71MB    stability checking
  time: 0.001; rss: 75MB    type collecting
  time: 0.000; rss: 75MB    outlives testing
  time: 0.000; rss: 75MB    impl wf inference
  time: 0.000; rss: 75MB    coherence checking
  time: 0.000; rss: 75MB    variance testing
  time: 0.005; rss: 80MB    wf checking
  time: 0.000; rss: 80MB    item-types checking
  time: 34.801; rss: 92MB   item-bodies checking
  time: 0.012; rss: 95MB    const checking
  time: 0.001; rss: 95MB    privacy checking
  time: 0.000; rss: 95MB    intrinsic checking
  time: 0.000; rss: 95MB    match checking
  time: 0.000; rss: 95MB    liveness checking
  time: 0.003; rss: 95MB    borrow checking
  time: 0.000; rss: 95MB    MIR borrow checking
  time: 0.000; rss: 95MB    MIR effect checking
  time: 0.000; rss: 95MB    death checking
  time: 0.000; rss: 95MB    unused lib feature checking
  time: 0.018; rss: 100MB   lint checking
  time: 0.000; rss: 100MB   resolving dependency formats
    time: 0.002; rss: 100MB write metadata
    time: 0.000; rss: 100MB translation item collection
    time: 0.000; rss: 100MB codegen unit partitioning
    time: 0.000; rss: 114MB translate to LLVM IR
    time: 0.000; rss: 114MB assert dep graph
    time: 0.000; rss: 114MB serialize dep graph
  time: 0.267; rss: 114MB   translation
    time: 0.000; rss: 114MB llvm function passes [lib0]
    time: 0.000; rss: 114MB llvm module passes [lib0]
    time: 0.001; rss: 115MB codegen passes [lib0]
  time: 0.005; rss: 115MB   LLVM passes
  time: 0.000; rss: 113MB   serialize work products
  time: 0.001; rss: 113MB   linking

Specialized version

Code

pub fn foo() -> Box<Iterator<Item = ()>> {
    use std::iter::empty;

    Box::new(empty()
        .chain(empty())
        .chain(empty())
        .chain(empty())
        .chain(empty())
        .chain(empty())
        .chain(empty())
        .chain(empty())
        .chain(empty())
        .chain(empty())
        .chain(empty()) // 10th .chain(empty())
        .chain(empty())
        .chain(empty())
        .chain(empty())
        .chain(empty())
        .chain(empty())
        .chain(empty())
        .chain(empty()) // 17th .chain(empty())
    )
}

$ rustc +nightly --crate-type=lib foo_specialized.rs -Z time-passes
...
  time: 43.160; rss: 89MB   item-bodies checking
...
    time: 34.286; rss: 99MB translation item collection
...
    time: 129.115; rss: 122MB   translate to LLVM IR
...
  time: 163.680; rss: 122MB translation
...
  time: 28.048; rss: 122MB  LLVM passes
...

All other timings are <0.2s.

Full output

  time: 0.001; rss: 41MB    parsing
  time: 0.000; rss: 41MB    recursion limit
  time: 0.000; rss: 41MB    crate injection
  time: 0.000; rss: 41MB    plugin loading
  time: 0.000; rss: 41MB    plugin registration
  time: 0.057; rss: 66MB    expansion
  time: 0.000; rss: 66MB    maybe building test harness
  time: 0.000; rss: 66MB    maybe creating a macro crate
  time: 0.000; rss: 66MB    creating allocators
  time: 0.000; rss: 66MB    checking for inline asm in case the target doesn't support it
  time: 0.000; rss: 66MB    AST validation
  time: 0.006; rss: 68MB    name resolution
  time: 0.000; rss: 68MB    complete gated feature checking
  time: 0.000; rss: 69MB    lowering ast -> hir
  time: 0.000; rss: 69MB    early lint checks
  time: 0.000; rss: 69MB    indexing hir
  time: 0.000; rss: 69MB    attribute checking
  time: 0.000; rss: 69MB    load query result cache
  time: 0.000; rss: 69MB    lifetime resolution
  time: 0.000; rss: 69MB    looking for entry point
  time: 0.000; rss: 69MB    looking for plugin registrar
  time: 0.000; rss: 69MB    loop checking
  time: 0.000; rss: 69MB    static item recursion checking
  time: 0.000; rss: 69MB    stability checking
  time: 0.001; rss: 73MB    type collecting
  time: 0.000; rss: 73MB    outlives testing
  time: 0.000; rss: 73MB    impl wf inference
  time: 0.000; rss: 73MB    coherence checking
  time: 0.000; rss: 73MB    variance testing
  time: 0.002; rss: 75MB    wf checking
  time: 0.000; rss: 75MB    item-types checking
  time: 43.160; rss: 89MB   item-bodies checking
  time: 0.014; rss: 91MB    const checking
  time: 0.001; rss: 91MB    privacy checking
  time: 0.000; rss: 91MB    intrinsic checking
  time: 0.000; rss: 91MB    match checking
  time: 0.000; rss: 91MB    liveness checking
  time: 0.003; rss: 94MB    borrow checking
  time: 0.000; rss: 94MB    MIR borrow checking
  time: 0.000; rss: 94MB    MIR effect checking
  time: 0.000; rss: 94MB    death checking
  time: 0.000; rss: 94MB    unused lib feature checking
  time: 0.018; rss: 96MB    lint checking
  time: 0.000; rss: 96MB    resolving dependency formats
    time: 0.001; rss: 96MB  write metadata
    time: 34.286; rss: 99MB translation item collection
    time: 0.001; rss: 99MB  codegen unit partitioning
    time: 0.000; rss: 114MB llvm function passes [lib0]
    time: 0.000; rss: 114MB llvm module passes [lib0]
    time: 0.006; rss: 114MB llvm function passes [lib4]
    time: 0.000; rss: 115MB llvm function passes [lib2]
    time: 0.000; rss: 115MB llvm module passes [lib2]
    time: 0.002; rss: 115MB llvm module passes [lib4]
    time: 0.001; rss: 117MB llvm function passes [lib5]
    time: 0.000; rss: 118MB llvm module passes [lib5]
    time: 0.000; rss: 117MB llvm function passes [lib3]
    time: 0.000; rss: 118MB llvm module passes [lib3]
    time: 0.011; rss: 118MB codegen passes [lib2]
    time: 0.011; rss: 117MB codegen passes [lib0]
    time: 0.000; rss: 120MB llvm function passes [lib8]
    time: 0.000; rss: 121MB llvm module passes [lib8]
    time: 0.012; rss: 122MB codegen passes [lib5]
    time: 0.004; rss: 120MB codegen passes [lib3]
    time: 0.008; rss: 122MB codegen passes [lib8]
    time: 0.134; rss: 122MB codegen passes [lib4]
    time: 0.001; rss: 122MB llvm function passes [lib1]
    time: 0.000; rss: 122MB llvm function passes [lib6]
    time: 0.001; rss: 122MB llvm module passes [lib1]
    time: 0.000; rss: 122MB llvm module passes [lib6]
    time: 0.000; rss: 122MB llvm function passes [lib7]
    time: 0.000; rss: 122MB llvm module passes [lib7]
    time: 129.115; rss: 122MB   translate to LLVM IR
    time: 0.000; rss: 122MB assert dep graph
    time: 0.000; rss: 122MB serialize dep graph
  time: 163.680; rss: 122MB translation
    time: 0.000; rss: 121MB llvm function passes [lib9]
    time: 0.000; rss: 121MB llvm module passes [lib9]
    time: 0.003; rss: 121MB codegen passes [lib9]
    time: 0.008; rss: 121MB codegen passes [lib7]
    time: 0.009; rss: 122MB codegen passes [lib1]
    time: 0.011; rss: 122MB codegen passes [lib6]
  time: 28.048; rss: 122MB  LLVM passes
  time: 0.000; rss: 122MB   serialize work products
  time: 0.003; rss: 122MB   linking

Rustc version:

$ rustc +nightly --version
rustc 1.23.0-nightly (827cb0d61 2017-11-26)

hcpl on 3 Dec 2017

👍2 ❤1

I'm digging on this. Some easy profiling:

ishitatsuyuki on 26 Dec 2017

After some digging it seems that https://github.com/rust-lang/rust/blob/ebddfcb0b19471d2277e05cfbc2ad8caa81637b7/src/librustc/ty/subst.rs#L324 may be a cause.

I only have little knowledge regarding the compiler, but if I'm correct this relates to @nikomatsakis experiment on the number of output types. This looks like a place we can do a exponential blowup, by the factor of parameters.

ishitatsuyuki on 26 Dec 2017

I've tried profiling rustc and getting a backtrace for the generic version by @hcpl above at some random moment of time.

Likely relevant lines from the backtrace

#16 0x00007ffff2f44dbb in rustc::traits::select::SelectionContext::select (self=0x7fffecabfbc0, obligation=0x7fffeca9e488) at librustc/traits/select.rs:526
#17 0x00007ffff3607861 in rustc::traits::project::assemble_candidates_from_impls::{{closure}} () at librustc/traits/project.rs:1013
#18 rustc::infer::InferCtxt::probe (self=<optimized out>, f=...) at librustc/infer/mod.rs:905
#19 0x00007ffff2f8b99f in rustc::traits::project::assemble_candidates_from_impls (selcx=0x7fffecabfbc0, obligation=<optimized out>, obligation_trait_ref=<optimized out>, candidate_set=0x7fffeca9f3d0) at librustc/traits/project.rs:1012
#20 rustc::traits::project::project_type (selcx=<optimized out>, obligation=<optimized out>) at librustc/traits/project.rs:817
--
#49 0x00007ffff2f44dbb in rustc::traits::select::SelectionContext::select (self=0x7fffecabfbc0, obligation=0x7fffeca9e488) at librustc/traits/select.rs:526
#50 0x00007ffff3607861 in rustc::traits::project::assemble_candidates_from_impls::{{closure}} () at librustc/traits/project.rs:1013
#51 rustc::infer::InferCtxt::probe (self=<optimized out>, f=...) at librustc/infer/mod.rs:905
#52 0x00007ffff2f8b99f in rustc::traits::project::assemble_candidates_from_impls (selcx=0x7fffecabfbc0, obligation=<optimized out>, obligation_trait_ref=<optimized out>, candidate_set=0x7fffecaa1550) at librustc/traits/project.rs:1012
#53 rustc::traits::project::project_type (selcx=<optimized out>, obligation=<optimized out>) at librustc/traits/project.rs:817
--
#82 0x00007ffff2f44dbb in rustc::traits::select::SelectionContext::select (self=0x7fffecabfbc0, obligation=0x7fffeca9e488) at librustc/traits/select.rs:526
#83 0x00007ffff2f8c276 in rustc::traits::project::confirm_select_candidate (selcx=0x7fffecabfbc0, obligation=0x7fffecaa3650, obligation_trait_ref=<optimized out>) at librustc/traits/project.rs:1174
#84 rustc::traits::project::confirm_candidate (obligation=0x7fffecaa3650, obligation_trait_ref=<optimized out>, selcx=<optimized out>, candidate=...) at librustc/traits/project.rs:1161
#85 rustc::traits::project::project_type (selcx=<optimized out>, obligation=<optimized out>) at librustc/traits/project.rs:878
#86 0x00007ffff2f88038 in rustc::traits::project::opt_normalize_projection_type (selcx=0x7fffecabfbc0, param_env=..., projection_ty=..., cause=..., depth=<optimized out>) at librustc/traits/project.rs:541
--
#114 0x00007ffff2f44dbb in rustc::traits::select::SelectionContext::select (self=0x7fffecabfbc0, obligation=0x7fffeca9e488) at librustc/traits/select.rs:526
#115 0x00007ffff3607861 in rustc::traits::project::assemble_candidates_from_impls::{{closure}} () at librustc/traits/project.rs:1013
#116 rustc::infer::InferCtxt::probe (self=<optimized out>, f=...) at librustc/infer/mod.rs:905
#117 0x00007ffff2f8b99f in rustc::traits::project::assemble_candidates_from_impls (selcx=0x7fffecabfbc0, obligation=<optimized out>, obligation_trait_ref=<optimized out>, candidate_set=0x7fffecaa55f0) at librustc/traits/project.rs:1012
#118 rustc::traits::project::project_type (selcx=<optimized out>, obligation=<optimized out>) at librustc/traits/project.rs:817
--
#147 0x00007ffff2f44dbb in rustc::traits::select::SelectionContext::select (self=0x7fffecabfbc0, obligation=0x7fffeca9e488) at librustc/traits/select.rs:526
#148 0x00007ffff3607861 in rustc::traits::project::assemble_candidates_from_impls::{{closure}} () at librustc/traits/project.rs:1013
#149 rustc::infer::InferCtxt::probe (self=<optimized out>, f=...) at librustc/infer/mod.rs:905
#150 0x00007ffff2f8b99f in rustc::traits::project::assemble_candidates_from_impls (selcx=0x7fffecabfbc0, obligation=<optimized out>, obligation_trait_ref=<optimized out>, candidate_set=0x7fffecaa7770) at librustc/traits/project.rs:1012
#151 rustc::traits::project::project_type (selcx=<optimized out>, obligation=<optimized out>) at librustc/traits/project.rs:817
--
#180 0x00007ffff2f44dbb in rustc::traits::select::SelectionContext::select (self=0x7fffecabfbc0, obligation=0x7fffeca9e488) at librustc/traits/select.rs:526
#181 0x00007ffff3607861 in rustc::traits::project::assemble_candidates_from_impls::{{closure}} () at librustc/traits/project.rs:1013
#182 rustc::infer::InferCtxt::probe (self=<optimized out>, f=...) at librustc/infer/mod.rs:905
#183 0x00007ffff2f8b99f in rustc::traits::project::assemble_candidates_from_impls (selcx=0x7fffecabfbc0, obligation=<optimized out>, obligation_trait_ref=<optimized out>, candidate_set=0x7fffecaa98f0) at librustc/traits/project.rs:1012
#184 rustc::traits::project::project_type (selcx=<optimized out>, obligation=<optimized out>) at librustc/traits/project.rs:817
--
#213 0x00007ffff2f44dbb in rustc::traits::select::SelectionContext::select (self=0x7fffecabfbc0, obligation=0x7fffeca9e488) at librustc/traits/select.rs:526
#214 0x00007ffff2f8c276 in rustc::traits::project::confirm_select_candidate (selcx=0x7fffecabfbc0, obligation=0x7fffecaab9f0, obligation_trait_ref=<optimized out>) at librustc/traits/project.rs:1174
#215 rustc::traits::project::confirm_candidate (obligation=0x7fffecaab9f0, obligation_trait_ref=<optimized out>, selcx=<optimized out>, candidate=...) at librustc/traits/project.rs:1161
#216 rustc::traits::project::project_type (selcx=<optimized out>, obligation=<optimized out>) at librustc/traits/project.rs:878
#217 0x00007ffff2f88038 in rustc::traits::project::opt_normalize_projection_type (selcx=0x7fffecabfbc0, param_env=..., projection_ty=..., cause=..., depth=<optimized out>) at librustc/traits/project.rs:541
--
#245 0x00007ffff2f44dbb in rustc::traits::select::SelectionContext::select (self=0x7fffecabfbc0, obligation=0x7fffeca9e488) at librustc/traits/select.rs:526
#246 0x00007ffff2f8c276 in rustc::traits::project::confirm_select_candidate (selcx=0x7fffecabfbc0, obligation=0x7fffecaad910, obligation_trait_ref=<optimized out>) at librustc/traits/project.rs:1174
#247 rustc::traits::project::confirm_candidate (obligation=0x7fffecaad910, obligation_trait_ref=<optimized out>, selcx=<optimized out>, candidate=...) at librustc/traits/project.rs:1161
#248 rustc::traits::project::project_type (selcx=<optimized out>, obligation=<optimized out>) at librustc/traits/project.rs:878
#249 0x00007ffff2f88038 in rustc::traits::project::opt_normalize_projection_type (selcx=0x7fffecabfbc0, param_env=..., projection_ty=..., cause=..., depth=<optimized out>) at librustc/traits/project.rs:541
--
#277 0x00007ffff2f44dbb in rustc::traits::select::SelectionContext::select (self=0x7fffecabfbc0, obligation=0x7fffeca9e488) at librustc/traits/select.rs:526
#278 0x00007ffff2f8c276 in rustc::traits::project::confirm_select_candidate (selcx=0x7fffecabfbc0, obligation=0x7fffecaaf830, obligation_trait_ref=<optimized out>) at librustc/traits/project.rs:1174
#279 rustc::traits::project::confirm_candidate (obligation=0x7fffecaaf830, obligation_trait_ref=<optimized out>, selcx=<optimized out>, candidate=...) at librustc/traits/project.rs:1161
#280 rustc::traits::project::project_type (selcx=<optimized out>, obligation=<optimized out>) at librustc/traits/project.rs:878
#281 0x00007ffff2f88038 in rustc::traits::project::opt_normalize_projection_type (selcx=0x7fffecabfbc0, param_env=..., projection_ty=..., cause=..., depth=<optimized out>) at librustc/traits/project.rs:541
--
#309 0x00007ffff2f44dbb in rustc::traits::select::SelectionContext::select (self=0x7fffecabfbc0, obligation=0x7fffeca9e488) at librustc/traits/select.rs:526
#310 0x00007ffff3607861 in rustc::traits::project::assemble_candidates_from_impls::{{closure}} () at librustc/traits/project.rs:1013
#311 rustc::infer::InferCtxt::probe (self=<optimized out>, f=...) at librustc/infer/mod.rs:905
#312 0x00007ffff2f8b99f in rustc::traits::project::assemble_candidates_from_impls (selcx=0x7fffecabfbc0, obligation=<optimized out>, obligation_trait_ref=<optimized out>, candidate_set=0x7fffecab17d0) at librustc/traits/project.rs:1012
#313 rustc::traits::project::project_type (selcx=<optimized out>, obligation=<optimized out>) at librustc/traits/project.rs:817
--
#342 0x00007ffff2f44dbb in rustc::traits::select::SelectionContext::select (self=0x7fffecabfbc0, obligation=0x7fffeca9e488) at librustc/traits/select.rs:526
#343 0x00007ffff2f8c276 in rustc::traits::project::confirm_select_candidate (selcx=0x7fffecabfbc0, obligation=0x7fffecab38d0, obligation_trait_ref=<optimized out>) at librustc/traits/project.rs:1174
#344 rustc::traits::project::confirm_candidate (obligation=0x7fffecab38d0, obligation_trait_ref=<optimized out>, selcx=<optimized out>, candidate=...) at librustc/traits/project.rs:1161
#345 rustc::traits::project::project_type (selcx=<optimized out>, obligation=<optimized out>) at librustc/traits/project.rs:878
#346 0x00007ffff2f88038 in rustc::traits::project::opt_normalize_projection_type (selcx=0x7fffecabfbc0, param_env=..., projection_ty=..., cause=..., depth=<optimized out>) at librustc/traits/project.rs:541
--
#374 0x00007ffff2f44dbb in rustc::traits::select::SelectionContext::select (self=0x7fffecabfbc0, obligation=0x7fffeca9e488) at librustc/traits/select.rs:526
#375 0x00007ffff2f8c276 in rustc::traits::project::confirm_select_candidate (selcx=0x7fffecabfbc0, obligation=0x7fffecab57f0, obligation_trait_ref=<optimized out>) at librustc/traits/project.rs:1174
#376 rustc::traits::project::confirm_candidate (obligation=0x7fffecab57f0, obligation_trait_ref=<optimized out>, selcx=<optimized out>, candidate=...) at librustc/traits/project.rs:1161
#377 rustc::traits::project::project_type (selcx=<optimized out>, obligation=<optimized out>) at librustc/traits/project.rs:878
#378 0x00007ffff2f88038 in rustc::traits::project::opt_normalize_projection_type (selcx=0x7fffecabfbc0, param_env=..., projection_ty=..., cause=..., depth=<optimized out>) at librustc/traits/project.rs:541
--
#406 0x00007ffff2f44dbb in rustc::traits::select::SelectionContext::select (self=0x7fffecabfbc0, obligation=0x7fffeca9e488) at librustc/traits/select.rs:526
#407 0x00007ffff2f8c276 in rustc::traits::project::confirm_select_candidate (selcx=0x7fffecabfbc0, obligation=0x7fffecab7710, obligation_trait_ref=<optimized out>) at librustc/traits/project.rs:1174
#408 rustc::traits::project::confirm_candidate (obligation=0x7fffecab7710, obligation_trait_ref=<optimized out>, selcx=<optimized out>, candidate=...) at librustc/traits/project.rs:1161
#409 rustc::traits::project::project_type (selcx=<optimized out>, obligation=<optimized out>) at librustc/traits/project.rs:878
#410 0x00007ffff2f88038 in rustc::traits::project::opt_normalize_projection_type (selcx=0x7fffecabfbc0, param_env=..., projection_ty=..., cause=..., depth=<optimized out>) at librustc/traits/project.rs:541
--
#438 0x00007ffff2f44dbb in rustc::traits::select::SelectionContext::select (self=0x7fffecabfbc0, obligation=0x7fffeca9e488) at librustc/traits/select.rs:526
#439 0x00007ffff2f8c276 in rustc::traits::project::confirm_select_candidate (selcx=0x7fffecabfbc0, obligation=0x7fffecab9630, obligation_trait_ref=<optimized out>) at librustc/traits/project.rs:1174
#440 rustc::traits::project::confirm_candidate (obligation=0x7fffecab9630, obligation_trait_ref=<optimized out>, selcx=<optimized out>, candidate=...) at librustc/traits/project.rs:1161
#441 rustc::traits::project::project_type (selcx=<optimized out>, obligation=<optimized out>) at librustc/traits/project.rs:878
#442 0x00007ffff2f88038 in rustc::traits::project::opt_normalize_projection_type (selcx=0x7fffecabfbc0, param_env=..., projection_ty=..., cause=..., depth=<optimized out>) at librustc/traits/project.rs:541
--
#470 0x00007ffff2f44dbb in rustc::traits::select::SelectionContext::select (self=0x7fffecabfbc0, obligation=0x7fffeca9e488) at librustc/traits/select.rs:526
#471 0x00007ffff3607861 in rustc::traits::project::assemble_candidates_from_impls::{{closure}} () at librustc/traits/project.rs:1013
#472 rustc::infer::InferCtxt::probe (self=<optimized out>, f=...) at librustc/infer/mod.rs:905
#473 0x00007ffff2f8b99f in rustc::traits::project::assemble_candidates_from_impls (selcx=0x7fffecabfbc0, obligation=<optimized out>, obligation_trait_ref=<optimized out>, candidate_set=0x7fffecabb5d0) at librustc/traits/project.rs:1012
#474 rustc::traits::project::project_type (selcx=<optimized out>, obligation=<optimized out>) at librustc/traits/project.rs:817
--
#503 0x00007ffff2f44dbb in rustc::traits::select::SelectionContext::select (self=0x7fffecabfbc0, obligation=0x7fffeca9e488) at librustc/traits/select.rs:526
#504 0x00007ffff3607861 in rustc::traits::project::assemble_candidates_from_impls::{{closure}} () at librustc/traits/project.rs:1013
#505 rustc::infer::InferCtxt::probe (self=<optimized out>, f=...) at librustc/infer/mod.rs:905
#506 0x00007ffff2f8b99f in rustc::traits::project::assemble_candidates_from_impls (selcx=0x7fffecabfbc0, obligation=<optimized out>, obligation_trait_ref=<optimized out>, candidate_set=0x7fffecabd750) at librustc/traits/project.rs:1012
#507 rustc::traits::project::project_type (selcx=<optimized out>, obligation=<optimized out>) at librustc/traits/project.rs:817

The cause in this case seems to be that at every Chain the rustc::traits::project::project_type function calls rustc::traits::select::SelectionContext::select twice: the first time from assemble_candidates_from_impls via InferCtxt::probe, the second time from rustc::traits::project::confirm_candidate. Both calls recurse on the next level, causing the exponential blowup.

tanriol on 8 Jan 2018

Thanks for the profiles everyone, btw, I've been spending a bit more time investigating too but don't yet have any insights to share. =)

nikomatsakis on 22 Jan 2018

@nikomatsakis can you share what problem have you stumbled over? Thanks to @tanriol's profile I think the root cause is quite clear; we just need to cache the call to eliminate the blowup. What I'm worrying is that SelectionContext::select does have side effects, and as I don't have much knowledge on projection I don't know if the second (redundant) execution can be easily removed.

ishitatsuyuki on 24 Jan 2018

This came up again in https://www.reddit.com/r/rust/comments/7v4kfj/impl_trait_vs_boxtrait_or_how_i_learned_to_stop/

jonhoo on 4 Feb 2018

👍6

Yeah, so this is related to deeply nested types itself, and it doesn't matter where it appear. As argument, return type, or even just creating on stack triggers this issue. #39684 #43757 should be duplicates.

ishitatsuyuki on 15 Feb 2018

Finding as of today:

The blowup roots from confirm_impl_candidate in select module (not project). Confirming candidate is a non-cachable operation. It then attempts to normalize the type (which doesn't seem to cache too) inside vtable_impl and recurs to next level.

cc @tanriol @nikomatsakis Please check if this clears up some obstacles. Primarily I'm wondering why normalization is continuously performed (maybe due to snapshot?).

ishitatsuyuki on 16 Feb 2018

Okay, I think I've successfully understood the overall picture of this blowup.

First, it's true that we're calling SelectionContext::select twice per iteration. This is what we probably need to reduce to fix the issue.

Second, the blowup occurs in operations that are not currently cached. In confirm_impl_candidate, we create an snapshot and then calls vtable_impl, then impl_or_trait_obligations. It's the place we recur onto next level, and because we are in a snapshot, the unification results are completely discarded after the operation. This is the second cause making the operation non-cachable: unification relies on side effects so it doesn't execute twice, but we revert the changes so it's computing every time.

Adding to that, impl_or_trait_obligations tries to normalize the inner type (?) for each predicate, which sounds like a multiplicative behavior here. However, this is ran without clearing the snapshot, which may make the situation better (it still sounds like an issue to me, though). Also, the reproduction example only used Iterator obligation which is the only one, thus the multiplicative behavior here hasn't been tested yet.

To sum up, we didn't get any short circuits here due to the two factors preventing caching from happening. Next, I'm proposing some solutions below, focusing on fixing the "calling twice" problem:

Try to preserve the snapshot for the impl candidate we resolve. We can reuse it later which eliminates the need of calling confirm_candidate the second time. However, as far I understood I'm not sure if snapshot is designed to be saved beyond scope (it always take a closure for pub APIs).
Do not call confirm_candidates the first time, instead call it only when we're really confirming it. My concern is that this may breaks code that relies on side effects (unification) which I don't understand yet.
This is rather focused on solving the normalization cache, and the idea is quite abstract. We can try to cache normalizations, but I have no idea what we should take as a key.

I need your expertise on this, so we can reach a solution that resolves the issue. Please give me your opinion!

ishitatsuyuki on 16 Feb 2018

Another update: the projection cache actually exists, but we're discarding them every time by wiping the InferCtxt (with snapshots). I have created a diagram to show how this recurs:
Diagram

So basically we need to implement this FIXME: https://github.com/rust-lang/rust/blob/554fe71ac6172ce9c0baeabbfd692280dcc893fe/src/librustc/traits/project.rs#L1548-L1550

ishitatsuyuki on 17 Feb 2018

❤11

@arielb1 You seem to have created this FIXME. Can you give me some hints on implementing cross-infcx cache?

ishitatsuyuki on 17 Feb 2018

I have launched a PoC in #48296 and I confirm that the exponential part should be resolved. I still observe some degree of polynomial time algorithms, which makes rustc chokes at about 50 chain().

ishitatsuyuki on 17 Feb 2018

@hcpl I suspect you copied the wrong code for the specialized version above, as I observe the same time-passes behavior and the code lacks main() despite you specify --crate-type bin for the specialized version.

Can you check again, and if possible, provide the correct specialized version code?

ishitatsuyuki on 17 Feb 2018

Update: although I have fixed the normalization part, the typeck still seems to be exponential on time, and very memory consuming. I'm investigating the cause of this second issue.

ishitatsuyuki on 18 Feb 2018

I have implemented a fix for the typeck part too, and I believe there's no more exponential algorithms in rustc anymore.

ishitatsuyuki on 18 Feb 2018

@ishitatsuyuki the code is correct, but both rustc invocations should have been with --crate-type=lib src/lib.rs (which were assumed to be in different crates). Sorry about that.

I've changed filenames to foo_generic.rs and foo_specialized.rs respectively to better reflect their purpose.

hcpl on 18 Feb 2018

Was this page helpful?

0 / 5 - 0 ratings