Compiling the postgres-tokio crate at https://github.com/sfackler/rust-postgres/commit/d27518ba76d76ccaa59b3ccd63e981bd8bd0ef33 goes from 5 seconds to 45 seconds on nightly if the two .boxed()
calls in the middle of this call chain are removed: https://github.com/sfackler/rust-postgres/blob/d27518ba76d76ccaa59b3ccd63e981bd8bd0ef33/postgres-tokio/src/lib.rs#L342-L408.
Looks like 15 seconds is spent in translation item collection, and 39 seconds is spent in translation:
time: 0.004 parsing
time: 0.005 recursion limit
time: 0.000 crate injection
time: 0.001 plugin loading
time: 0.000 plugin registration
time: 0.238 expansion
time: 0.000 maybe building test harness
time: 0.000 maybe creating a macro crate
time: 0.000 checking for inline asm in case the target doesn't support it
time: 0.000 complete gated feature checking
time: 0.008 early lint checks
time: 0.000 AST validation
time: 0.005 name resolution
time: 0.004 lowering ast -> hir
time: 0.002 indexing hir
time: 0.000 attribute checking
time: 0.001 language item collection
time: 0.001 lifetime resolution
time: 0.000 looking for entry point
time: 0.000 looking for plugin registrar
time: 0.001 region resolution
time: 0.000 loop checking
time: 0.000 static item recursion checking
time: 0.005 compute_incremental_hashes_map
time: 0.000 load_dep_graph
time: 0.000 stability index
time: 0.001 stability checking
time: 0.011 type collecting
time: 0.000 variance inference
time: 0.000 impl wf inference
time: 0.021 coherence checking
time: 0.008 wf checking
time: 0.003 item-types checking
time: 0.500 item-bodies checking
time: 0.000 drop-impl checking
time: 0.003 const checking
time: 0.001 privacy checking
time: 0.000 intrinsic checking
time: 0.000 effect checking
time: 0.001 match checking
time: 0.000 liveness checking
time: 0.002 rvalue checking
time: 0.014 MIR dump
time: 0.001 SimplifyCfg
time: 0.002 QualifyAndPromoteConstants
time: 0.002 TypeckMir
time: 0.000 SimplifyBranches
time: 0.001 SimplifyCfg
time: 0.006 MIR cleanup and validation
time: 0.006 borrow checking
time: 0.000 reachability checking
time: 0.000 death checking
time: 0.000 unused lib feature checking
time: 1.366 lint checking
time: 0.000 resolving dependency formats
time: 0.000 NoLandingPads
time: 0.001 SimplifyCfg
time: 0.001 EraseRegions
time: 0.000 AddCallGuards
time: 0.008 ElaborateDrops
time: 0.000 NoLandingPads
time: 0.001 SimplifyCfg
time: 0.001 InstCombine
time: 0.000 Deaggregator
time: 0.000 CopyPropagation
time: 0.001 SimplifyLocals
time: 0.000 AddCallGuards
time: 0.000 PreTrans
time: 0.015 MIR optimisations
time: 0.001 write metadata
time: 15.086 translation item collection
time: 0.020 codegen unit partitioning
time: 0.022 internalize symbols
time: 39.376 translation
time: 0.000 assert dep graph
time: 0.000 serialize dep graph
time: 0.089 llvm function passes [0]
time: 0.037 llvm module passes [0]
time: 2.148 codegen passes [0]
time: 0.001 codegen passes [0]
time: 2.432 LLVM passes
time: 0.000 serialize work products
time: 0.084 linking
Things are significantly worse on 1.13 - 2 minutes in translation!
Some discussion in IRC: https://botbot.me/mozilla/rust-internals/2016-12-22/?msg=78294648&page=1
cc @aturon
UPDATE: #40280 was closed as a duplicate of this. It had the following sample code:
let &(first, B, C, D, E, F, G, H) = self;
let iter = first.buffers_list();
let iter = iter.chain(B.buffers_list());
let iter = iter.chain(C.buffers_list());
let iter = iter.chain(D.buffers_list());
let iter = iter.chain(E.buffers_list());
let iter = iter.chain(F.buffers_list());
let iter = iter.chain(G.buffers_list());
let iter = iter.chain(H.buffers_list());
Box::new(iter)
--nmatsakis
Things are significantly worse on 1.13 - 2 minutes in translation!
This may be the effect of the projection cache.
cc @alexcrichton This is obviously a show-stopper for futures.
I can do some profiling. I've had some plans for improving collection/trans that I think may be related. One question to try and answer is what %age of this is just "we are making more code" vs "we are wasting time doing things in trait selection that could be cached". I have observed the latter from time to time and had some thoughts on how to fix it.
I've also seen this before, with tokio-socks5 as well. Removing just a handful of the trait objects in that file makes the compile time of the crate shoot from 2.34s to 89.52s (!!)
Here's a simple example of something that takes a very long time to compile:
future::ok::<(),()>(()).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(()));
Closed #40280 as a duplicate, moved some example code into the issue header.
Just a stupid question ‒ the „translation item collection“, which is one of the two culprits here, is listed in the „MIR optimisations“. Does it really have to happen at all on non-optimised debug build?
It's natural to read it that way, but the headers are actual after their group, so it's part of translation
This seems to be hard enough to take some time fixing. So I thought of a workaround, if someone is also interested. It uses the trick with placing trait objects to split the chains of modifiers, but only on testing/debug builds where the compilation speed matters, while it keeps the complex but hopefully faster concrete types in release build:
This one is for streams (that's what I needed), but can obviously work for futures or other things as well:
#[cfg(debug_assertions)]
fn test_boxed<T, E, S>(s: S) -> Box<Stream<Item = T, Error = E>>
where S: Stream<Item = T, Error = E> + 'static
{
Box::new(s)
}
#[cfg(not(debug_assertions))]
fn test_boxed<S>(s: S) -> S {
s
}
(I didn't find any better config option than the debug_assertions
, but I hope that one is good enough)
Nominating in place of https://github.com/rust-lang/rust/issues/42941. cc @eddyb (nominator)
Repeating another bad case from #42941:
extern crate futures;
use futures::{future, IntoFuture, Future};
fn main() {
let t: std::result::Result<(), ()> = Ok(());
let f = t
.into_future()
.and_then(|_| future::ok(()))
.and_then(|_| future::ok(()))
.and_then(|_| future::ok(()))
.and_then(|_| future::ok(()))
.and_then(|_| future::ok(()))
.and_then(|_| future::ok(()))
.and_then(|_| future::ok(()))
.and_then(|_| future::ok(()))
.and_then(|_| future::ok(()))
.and_then(|_| future::ok(()))
.and_then(|_| future::ok(()))
.and_then(|_| future::ok(()))
.and_then(|_| future::ok(()))
.and_then(|_| future::ok(()))
.and_then(|_| future::ok(()))
.and_then(|_| future::ok(()))
.and_then(|_| future::ok(()))
.and_then(|_| future::ok(()));
f.wait();
}
The code above takes ~750s to compile on my laptop (you can make it shorter/longer by removing/adding .and_then
calls):
...
time: 0.000; rss: 108MB write metadata
time: 472.840; rss: 110MB translation item collection
time: 0.001; rss: 112MB codegen unit partitioning
time: 0.001; rss: 133MB internalize symbols
time: 732.741; rss: 133MB translation
...
The extra ~250s is spent between codegen unit partitioning
and internalize symbols
.
Another observation from #42941 about why this is becoming even more important is that the newly-released hyper 0.11
uses futures everywhere, which will likely cause many more programs to use chains of futures like this going forward.
triage: P-high
We are raising to "high priority" to at least do some investigation and try to determine whether revised trait solving strategies will be of use here.
status: waiting for niko
Bump. This really hurts impl Trait
(boxing hides this issue).
I'm not sure if this is fixed... I'm on nightly, version 1.23.0-nightly
on Arch Linux...
I get the following in -Ztime-passes
:
time: 0.001; rss: 57MB parsing
time: 0.000; rss: 57MB recursion limit
time: 0.000; rss: 57MB crate injection
time: 0.000; rss: 57MB plugin loading
time: 0.000; rss: 57MB plugin registration
time: 0.054; rss: 122MB expansion
time: 0.000; rss: 122MB maybe building test harness
time: 0.000; rss: 122MB maybe creating a macro crate
time: 0.000; rss: 122MB creating allocators
time: 0.000; rss: 122MB checking for inline asm in case the target doesn't support it
time: 0.000; rss: 122MB AST validation
time: 0.006; rss: 128MB name resolution
time: 0.000; rss: 128MB complete gated feature checking
time: 0.000; rss: 128MB lowering ast -> hir
time: 0.000; rss: 128MB early lint checks
time: 0.001; rss: 128MB indexing hir
time: 0.000; rss: 128MB attribute checking
time: 0.000; rss: 128MB lifetime resolution
time: 0.000; rss: 128MB looking for entry point
time: 0.000; rss: 128MB looking for plugin registrar
time: 0.000; rss: 128MB loop checking
time: 0.000; rss: 128MB static item recursion checking
time: 0.000; rss: 132MB stability checking
time: 0.001; rss: 135MB type collecting
time: 0.000; rss: 135MB outlives testing
time: 0.000; rss: 135MB impl wf inference
time: 0.000; rss: 135MB coherence checking
time: 0.000; rss: 135MB variance testing
time: 0.001; rss: 139MB wf checking
time: 0.000; rss: 139MB item-types checking
time: 0.080; rss: 173MB item-bodies checking
time: 0.009; rss: 177MB const checking
time: 0.001; rss: 177MB privacy checking
time: 0.000; rss: 177MB intrinsic checking
time: 0.000; rss: 177MB match checking
time: 0.000; rss: 177MB liveness checking
time: 0.679; rss: 177MB borrow checking
time: 0.000; rss: 177MB MIR borrow checking
time: 0.000; rss: 177MB MIR effect checking
time: 0.000; rss: 177MB death checking
time: 0.000; rss: 177MB unused lib feature checking
time: 0.009; rss: 183MB lint checking
time: 0.000; rss: 183MB resolving dependency formats
time: 0.000; rss: 183MB write metadata
so this looks very similar. In fact, I too have code with a bunch of:
future_maker()
.and_then( futurey_stuff())
.and_then( futurey_stuff())
.and_then( futurey_stuff())
.and_then( futurey_stuff())
.and_then( futurey_stuff())
.and_then( futurey_stuff())
Sorry if it's a bother... just wasn't sure where to ask and this seems to be the most recent reference.
Does it matter that some of the and_then
s have move closures instead?
Of course not. Particularly for the futures case generators will help, but @nikomatsakis doesn't seem to be working on this.
Just to double-check, that'd be std::ops::Generator
?
Why would generators help? It's a trait, it doesn't do away with the deeply nested types.
Generators help because they are an alternative to combinators. When you write a generator, you get a single (anonymous) type for the entire state machine; when you use combinators, you build the state machine out of nested types.
triage: P-medium
I'm going to lower this from P-high to reflect reality. I'm still eager to investigate but haven't had time, and making it P-high is not helping =)
So I finally did some work here. In particular, I extracted out a self-contained example that seems to reproduce the problem, which let me do a bit of experimentation. The gist contains two variants of the example; one that models Future
quite closely (though without IntoFuture
), and the other which removes the Output
type. The timing results are quite interesting. See if you can spot the outlier. ;)
size
variant 4 8 12 16
full .19s .34s 2.5s 36.9s
no-output .19s .26s .9s 9.8s
I haven't had time to dig deeper than that yet.
These can probably be considered even more self-contained than what @nikomatsakis have made in this post, and IMO they show the issue more consistently (reproducible exponential blowups by a factor of approximately 2 for each line of .chain(empty())
in every compilation step I specifically highlighted below) so I'll post them:
Code
pub fn foo<T: 'static>() -> Box<Iterator<Item = [T; 17]>> {
use std::iter::empty;
Box::new(empty()
.chain(empty())
.chain(empty())
.chain(empty())
.chain(empty())
.chain(empty())
.chain(empty())
.chain(empty())
.chain(empty())
.chain(empty())
.chain(empty()) // 10th .chain(empty())
.chain(empty())
.chain(empty())
.chain(empty())
.chain(empty())
.chain(empty())
.chain(empty())
.chain(empty()) // 17th .chain(empty())
)
}
$ rustc +nightly --crate-type=lib foo_generic.rs -Z time-passes
...
time: 34.801; rss: 92MB item-bodies checking
...
All other timings are <0.3s.
Full output
time: 0.001; rss: 43MB parsing
time: 0.000; rss: 43MB recursion limit
time: 0.000; rss: 43MB crate injection
time: 0.000; rss: 43MB plugin loading
time: 0.000; rss: 43MB plugin registration
time: 0.068; rss: 68MB expansion
time: 0.000; rss: 68MB maybe building test harness
time: 0.000; rss: 68MB maybe creating a macro crate
time: 0.000; rss: 68MB creating allocators
time: 0.000; rss: 68MB checking for inline asm in case the target doesn't support it
time: 0.000; rss: 68MB AST validation
time: 0.006; rss: 70MB name resolution
time: 0.000; rss: 70MB complete gated feature checking
time: 0.000; rss: 71MB lowering ast -> hir
time: 0.000; rss: 71MB early lint checks
time: 0.000; rss: 71MB indexing hir
time: 0.000; rss: 71MB attribute checking
time: 0.000; rss: 71MB load query result cache
time: 0.000; rss: 71MB lifetime resolution
time: 0.000; rss: 71MB looking for entry point
time: 0.000; rss: 71MB looking for plugin registrar
time: 0.000; rss: 71MB loop checking
time: 0.000; rss: 71MB static item recursion checking
time: 0.000; rss: 71MB stability checking
time: 0.001; rss: 75MB type collecting
time: 0.000; rss: 75MB outlives testing
time: 0.000; rss: 75MB impl wf inference
time: 0.000; rss: 75MB coherence checking
time: 0.000; rss: 75MB variance testing
time: 0.005; rss: 80MB wf checking
time: 0.000; rss: 80MB item-types checking
time: 34.801; rss: 92MB item-bodies checking
time: 0.012; rss: 95MB const checking
time: 0.001; rss: 95MB privacy checking
time: 0.000; rss: 95MB intrinsic checking
time: 0.000; rss: 95MB match checking
time: 0.000; rss: 95MB liveness checking
time: 0.003; rss: 95MB borrow checking
time: 0.000; rss: 95MB MIR borrow checking
time: 0.000; rss: 95MB MIR effect checking
time: 0.000; rss: 95MB death checking
time: 0.000; rss: 95MB unused lib feature checking
time: 0.018; rss: 100MB lint checking
time: 0.000; rss: 100MB resolving dependency formats
time: 0.002; rss: 100MB write metadata
time: 0.000; rss: 100MB translation item collection
time: 0.000; rss: 100MB codegen unit partitioning
time: 0.000; rss: 114MB translate to LLVM IR
time: 0.000; rss: 114MB assert dep graph
time: 0.000; rss: 114MB serialize dep graph
time: 0.267; rss: 114MB translation
time: 0.000; rss: 114MB llvm function passes [lib0]
time: 0.000; rss: 114MB llvm module passes [lib0]
time: 0.001; rss: 115MB codegen passes [lib0]
time: 0.005; rss: 115MB LLVM passes
time: 0.000; rss: 113MB serialize work products
time: 0.001; rss: 113MB linking
Code
pub fn foo() -> Box<Iterator<Item = ()>> {
use std::iter::empty;
Box::new(empty()
.chain(empty())
.chain(empty())
.chain(empty())
.chain(empty())
.chain(empty())
.chain(empty())
.chain(empty())
.chain(empty())
.chain(empty())
.chain(empty()) // 10th .chain(empty())
.chain(empty())
.chain(empty())
.chain(empty())
.chain(empty())
.chain(empty())
.chain(empty())
.chain(empty()) // 17th .chain(empty())
)
}
$ rustc +nightly --crate-type=lib foo_specialized.rs -Z time-passes
...
time: 43.160; rss: 89MB item-bodies checking
...
time: 34.286; rss: 99MB translation item collection
...
time: 129.115; rss: 122MB translate to LLVM IR
...
time: 163.680; rss: 122MB translation
...
time: 28.048; rss: 122MB LLVM passes
...
All other timings are <0.2s.
Full output
time: 0.001; rss: 41MB parsing
time: 0.000; rss: 41MB recursion limit
time: 0.000; rss: 41MB crate injection
time: 0.000; rss: 41MB plugin loading
time: 0.000; rss: 41MB plugin registration
time: 0.057; rss: 66MB expansion
time: 0.000; rss: 66MB maybe building test harness
time: 0.000; rss: 66MB maybe creating a macro crate
time: 0.000; rss: 66MB creating allocators
time: 0.000; rss: 66MB checking for inline asm in case the target doesn't support it
time: 0.000; rss: 66MB AST validation
time: 0.006; rss: 68MB name resolution
time: 0.000; rss: 68MB complete gated feature checking
time: 0.000; rss: 69MB lowering ast -> hir
time: 0.000; rss: 69MB early lint checks
time: 0.000; rss: 69MB indexing hir
time: 0.000; rss: 69MB attribute checking
time: 0.000; rss: 69MB load query result cache
time: 0.000; rss: 69MB lifetime resolution
time: 0.000; rss: 69MB looking for entry point
time: 0.000; rss: 69MB looking for plugin registrar
time: 0.000; rss: 69MB loop checking
time: 0.000; rss: 69MB static item recursion checking
time: 0.000; rss: 69MB stability checking
time: 0.001; rss: 73MB type collecting
time: 0.000; rss: 73MB outlives testing
time: 0.000; rss: 73MB impl wf inference
time: 0.000; rss: 73MB coherence checking
time: 0.000; rss: 73MB variance testing
time: 0.002; rss: 75MB wf checking
time: 0.000; rss: 75MB item-types checking
time: 43.160; rss: 89MB item-bodies checking
time: 0.014; rss: 91MB const checking
time: 0.001; rss: 91MB privacy checking
time: 0.000; rss: 91MB intrinsic checking
time: 0.000; rss: 91MB match checking
time: 0.000; rss: 91MB liveness checking
time: 0.003; rss: 94MB borrow checking
time: 0.000; rss: 94MB MIR borrow checking
time: 0.000; rss: 94MB MIR effect checking
time: 0.000; rss: 94MB death checking
time: 0.000; rss: 94MB unused lib feature checking
time: 0.018; rss: 96MB lint checking
time: 0.000; rss: 96MB resolving dependency formats
time: 0.001; rss: 96MB write metadata
time: 34.286; rss: 99MB translation item collection
time: 0.001; rss: 99MB codegen unit partitioning
time: 0.000; rss: 114MB llvm function passes [lib0]
time: 0.000; rss: 114MB llvm module passes [lib0]
time: 0.006; rss: 114MB llvm function passes [lib4]
time: 0.000; rss: 115MB llvm function passes [lib2]
time: 0.000; rss: 115MB llvm module passes [lib2]
time: 0.002; rss: 115MB llvm module passes [lib4]
time: 0.001; rss: 117MB llvm function passes [lib5]
time: 0.000; rss: 118MB llvm module passes [lib5]
time: 0.000; rss: 117MB llvm function passes [lib3]
time: 0.000; rss: 118MB llvm module passes [lib3]
time: 0.011; rss: 118MB codegen passes [lib2]
time: 0.011; rss: 117MB codegen passes [lib0]
time: 0.000; rss: 120MB llvm function passes [lib8]
time: 0.000; rss: 121MB llvm module passes [lib8]
time: 0.012; rss: 122MB codegen passes [lib5]
time: 0.004; rss: 120MB codegen passes [lib3]
time: 0.008; rss: 122MB codegen passes [lib8]
time: 0.134; rss: 122MB codegen passes [lib4]
time: 0.001; rss: 122MB llvm function passes [lib1]
time: 0.000; rss: 122MB llvm function passes [lib6]
time: 0.001; rss: 122MB llvm module passes [lib1]
time: 0.000; rss: 122MB llvm module passes [lib6]
time: 0.000; rss: 122MB llvm function passes [lib7]
time: 0.000; rss: 122MB llvm module passes [lib7]
time: 129.115; rss: 122MB translate to LLVM IR
time: 0.000; rss: 122MB assert dep graph
time: 0.000; rss: 122MB serialize dep graph
time: 163.680; rss: 122MB translation
time: 0.000; rss: 121MB llvm function passes [lib9]
time: 0.000; rss: 121MB llvm module passes [lib9]
time: 0.003; rss: 121MB codegen passes [lib9]
time: 0.008; rss: 121MB codegen passes [lib7]
time: 0.009; rss: 122MB codegen passes [lib1]
time: 0.011; rss: 122MB codegen passes [lib6]
time: 28.048; rss: 122MB LLVM passes
time: 0.000; rss: 122MB serialize work products
time: 0.003; rss: 122MB linking
Rustc version:
$ rustc +nightly --version
rustc 1.23.0-nightly (827cb0d61 2017-11-26)
I'm digging on this. Some easy profiling:
After some digging it seems that https://github.com/rust-lang/rust/blob/ebddfcb0b19471d2277e05cfbc2ad8caa81637b7/src/librustc/ty/subst.rs#L324 may be a cause.
I only have little knowledge regarding the compiler, but if I'm correct this relates to @nikomatsakis experiment on the number of output types. This looks like a place we can do a exponential blowup, by the factor of parameters.
I've tried profiling rustc and getting a backtrace for the generic version by @hcpl above at some random moment of time.
Likely relevant lines from the backtrace
#16 0x00007ffff2f44dbb in rustc::traits::select::SelectionContext::select (self=0x7fffecabfbc0, obligation=0x7fffeca9e488) at librustc/traits/select.rs:526
#17 0x00007ffff3607861 in rustc::traits::project::assemble_candidates_from_impls::{{closure}} () at librustc/traits/project.rs:1013
#18 rustc::infer::InferCtxt::probe (self=<optimized out>, f=...) at librustc/infer/mod.rs:905
#19 0x00007ffff2f8b99f in rustc::traits::project::assemble_candidates_from_impls (selcx=0x7fffecabfbc0, obligation=<optimized out>, obligation_trait_ref=<optimized out>, candidate_set=0x7fffeca9f3d0) at librustc/traits/project.rs:1012
#20 rustc::traits::project::project_type (selcx=<optimized out>, obligation=<optimized out>) at librustc/traits/project.rs:817
--
#49 0x00007ffff2f44dbb in rustc::traits::select::SelectionContext::select (self=0x7fffecabfbc0, obligation=0x7fffeca9e488) at librustc/traits/select.rs:526
#50 0x00007ffff3607861 in rustc::traits::project::assemble_candidates_from_impls::{{closure}} () at librustc/traits/project.rs:1013
#51 rustc::infer::InferCtxt::probe (self=<optimized out>, f=...) at librustc/infer/mod.rs:905
#52 0x00007ffff2f8b99f in rustc::traits::project::assemble_candidates_from_impls (selcx=0x7fffecabfbc0, obligation=<optimized out>, obligation_trait_ref=<optimized out>, candidate_set=0x7fffecaa1550) at librustc/traits/project.rs:1012
#53 rustc::traits::project::project_type (selcx=<optimized out>, obligation=<optimized out>) at librustc/traits/project.rs:817
--
#82 0x00007ffff2f44dbb in rustc::traits::select::SelectionContext::select (self=0x7fffecabfbc0, obligation=0x7fffeca9e488) at librustc/traits/select.rs:526
#83 0x00007ffff2f8c276 in rustc::traits::project::confirm_select_candidate (selcx=0x7fffecabfbc0, obligation=0x7fffecaa3650, obligation_trait_ref=<optimized out>) at librustc/traits/project.rs:1174
#84 rustc::traits::project::confirm_candidate (obligation=0x7fffecaa3650, obligation_trait_ref=<optimized out>, selcx=<optimized out>, candidate=...) at librustc/traits/project.rs:1161
#85 rustc::traits::project::project_type (selcx=<optimized out>, obligation=<optimized out>) at librustc/traits/project.rs:878
#86 0x00007ffff2f88038 in rustc::traits::project::opt_normalize_projection_type (selcx=0x7fffecabfbc0, param_env=..., projection_ty=..., cause=..., depth=<optimized out>) at librustc/traits/project.rs:541
--
#114 0x00007ffff2f44dbb in rustc::traits::select::SelectionContext::select (self=0x7fffecabfbc0, obligation=0x7fffeca9e488) at librustc/traits/select.rs:526
#115 0x00007ffff3607861 in rustc::traits::project::assemble_candidates_from_impls::{{closure}} () at librustc/traits/project.rs:1013
#116 rustc::infer::InferCtxt::probe (self=<optimized out>, f=...) at librustc/infer/mod.rs:905
#117 0x00007ffff2f8b99f in rustc::traits::project::assemble_candidates_from_impls (selcx=0x7fffecabfbc0, obligation=<optimized out>, obligation_trait_ref=<optimized out>, candidate_set=0x7fffecaa55f0) at librustc/traits/project.rs:1012
#118 rustc::traits::project::project_type (selcx=<optimized out>, obligation=<optimized out>) at librustc/traits/project.rs:817
--
#147 0x00007ffff2f44dbb in rustc::traits::select::SelectionContext::select (self=0x7fffecabfbc0, obligation=0x7fffeca9e488) at librustc/traits/select.rs:526
#148 0x00007ffff3607861 in rustc::traits::project::assemble_candidates_from_impls::{{closure}} () at librustc/traits/project.rs:1013
#149 rustc::infer::InferCtxt::probe (self=<optimized out>, f=...) at librustc/infer/mod.rs:905
#150 0x00007ffff2f8b99f in rustc::traits::project::assemble_candidates_from_impls (selcx=0x7fffecabfbc0, obligation=<optimized out>, obligation_trait_ref=<optimized out>, candidate_set=0x7fffecaa7770) at librustc/traits/project.rs:1012
#151 rustc::traits::project::project_type (selcx=<optimized out>, obligation=<optimized out>) at librustc/traits/project.rs:817
--
#180 0x00007ffff2f44dbb in rustc::traits::select::SelectionContext::select (self=0x7fffecabfbc0, obligation=0x7fffeca9e488) at librustc/traits/select.rs:526
#181 0x00007ffff3607861 in rustc::traits::project::assemble_candidates_from_impls::{{closure}} () at librustc/traits/project.rs:1013
#182 rustc::infer::InferCtxt::probe (self=<optimized out>, f=...) at librustc/infer/mod.rs:905
#183 0x00007ffff2f8b99f in rustc::traits::project::assemble_candidates_from_impls (selcx=0x7fffecabfbc0, obligation=<optimized out>, obligation_trait_ref=<optimized out>, candidate_set=0x7fffecaa98f0) at librustc/traits/project.rs:1012
#184 rustc::traits::project::project_type (selcx=<optimized out>, obligation=<optimized out>) at librustc/traits/project.rs:817
--
#213 0x00007ffff2f44dbb in rustc::traits::select::SelectionContext::select (self=0x7fffecabfbc0, obligation=0x7fffeca9e488) at librustc/traits/select.rs:526
#214 0x00007ffff2f8c276 in rustc::traits::project::confirm_select_candidate (selcx=0x7fffecabfbc0, obligation=0x7fffecaab9f0, obligation_trait_ref=<optimized out>) at librustc/traits/project.rs:1174
#215 rustc::traits::project::confirm_candidate (obligation=0x7fffecaab9f0, obligation_trait_ref=<optimized out>, selcx=<optimized out>, candidate=...) at librustc/traits/project.rs:1161
#216 rustc::traits::project::project_type (selcx=<optimized out>, obligation=<optimized out>) at librustc/traits/project.rs:878
#217 0x00007ffff2f88038 in rustc::traits::project::opt_normalize_projection_type (selcx=0x7fffecabfbc0, param_env=..., projection_ty=..., cause=..., depth=<optimized out>) at librustc/traits/project.rs:541
--
#245 0x00007ffff2f44dbb in rustc::traits::select::SelectionContext::select (self=0x7fffecabfbc0, obligation=0x7fffeca9e488) at librustc/traits/select.rs:526
#246 0x00007ffff2f8c276 in rustc::traits::project::confirm_select_candidate (selcx=0x7fffecabfbc0, obligation=0x7fffecaad910, obligation_trait_ref=<optimized out>) at librustc/traits/project.rs:1174
#247 rustc::traits::project::confirm_candidate (obligation=0x7fffecaad910, obligation_trait_ref=<optimized out>, selcx=<optimized out>, candidate=...) at librustc/traits/project.rs:1161
#248 rustc::traits::project::project_type (selcx=<optimized out>, obligation=<optimized out>) at librustc/traits/project.rs:878
#249 0x00007ffff2f88038 in rustc::traits::project::opt_normalize_projection_type (selcx=0x7fffecabfbc0, param_env=..., projection_ty=..., cause=..., depth=<optimized out>) at librustc/traits/project.rs:541
--
#277 0x00007ffff2f44dbb in rustc::traits::select::SelectionContext::select (self=0x7fffecabfbc0, obligation=0x7fffeca9e488) at librustc/traits/select.rs:526
#278 0x00007ffff2f8c276 in rustc::traits::project::confirm_select_candidate (selcx=0x7fffecabfbc0, obligation=0x7fffecaaf830, obligation_trait_ref=<optimized out>) at librustc/traits/project.rs:1174
#279 rustc::traits::project::confirm_candidate (obligation=0x7fffecaaf830, obligation_trait_ref=<optimized out>, selcx=<optimized out>, candidate=...) at librustc/traits/project.rs:1161
#280 rustc::traits::project::project_type (selcx=<optimized out>, obligation=<optimized out>) at librustc/traits/project.rs:878
#281 0x00007ffff2f88038 in rustc::traits::project::opt_normalize_projection_type (selcx=0x7fffecabfbc0, param_env=..., projection_ty=..., cause=..., depth=<optimized out>) at librustc/traits/project.rs:541
--
#309 0x00007ffff2f44dbb in rustc::traits::select::SelectionContext::select (self=0x7fffecabfbc0, obligation=0x7fffeca9e488) at librustc/traits/select.rs:526
#310 0x00007ffff3607861 in rustc::traits::project::assemble_candidates_from_impls::{{closure}} () at librustc/traits/project.rs:1013
#311 rustc::infer::InferCtxt::probe (self=<optimized out>, f=...) at librustc/infer/mod.rs:905
#312 0x00007ffff2f8b99f in rustc::traits::project::assemble_candidates_from_impls (selcx=0x7fffecabfbc0, obligation=<optimized out>, obligation_trait_ref=<optimized out>, candidate_set=0x7fffecab17d0) at librustc/traits/project.rs:1012
#313 rustc::traits::project::project_type (selcx=<optimized out>, obligation=<optimized out>) at librustc/traits/project.rs:817
--
#342 0x00007ffff2f44dbb in rustc::traits::select::SelectionContext::select (self=0x7fffecabfbc0, obligation=0x7fffeca9e488) at librustc/traits/select.rs:526
#343 0x00007ffff2f8c276 in rustc::traits::project::confirm_select_candidate (selcx=0x7fffecabfbc0, obligation=0x7fffecab38d0, obligation_trait_ref=<optimized out>) at librustc/traits/project.rs:1174
#344 rustc::traits::project::confirm_candidate (obligation=0x7fffecab38d0, obligation_trait_ref=<optimized out>, selcx=<optimized out>, candidate=...) at librustc/traits/project.rs:1161
#345 rustc::traits::project::project_type (selcx=<optimized out>, obligation=<optimized out>) at librustc/traits/project.rs:878
#346 0x00007ffff2f88038 in rustc::traits::project::opt_normalize_projection_type (selcx=0x7fffecabfbc0, param_env=..., projection_ty=..., cause=..., depth=<optimized out>) at librustc/traits/project.rs:541
--
#374 0x00007ffff2f44dbb in rustc::traits::select::SelectionContext::select (self=0x7fffecabfbc0, obligation=0x7fffeca9e488) at librustc/traits/select.rs:526
#375 0x00007ffff2f8c276 in rustc::traits::project::confirm_select_candidate (selcx=0x7fffecabfbc0, obligation=0x7fffecab57f0, obligation_trait_ref=<optimized out>) at librustc/traits/project.rs:1174
#376 rustc::traits::project::confirm_candidate (obligation=0x7fffecab57f0, obligation_trait_ref=<optimized out>, selcx=<optimized out>, candidate=...) at librustc/traits/project.rs:1161
#377 rustc::traits::project::project_type (selcx=<optimized out>, obligation=<optimized out>) at librustc/traits/project.rs:878
#378 0x00007ffff2f88038 in rustc::traits::project::opt_normalize_projection_type (selcx=0x7fffecabfbc0, param_env=..., projection_ty=..., cause=..., depth=<optimized out>) at librustc/traits/project.rs:541
--
#406 0x00007ffff2f44dbb in rustc::traits::select::SelectionContext::select (self=0x7fffecabfbc0, obligation=0x7fffeca9e488) at librustc/traits/select.rs:526
#407 0x00007ffff2f8c276 in rustc::traits::project::confirm_select_candidate (selcx=0x7fffecabfbc0, obligation=0x7fffecab7710, obligation_trait_ref=<optimized out>) at librustc/traits/project.rs:1174
#408 rustc::traits::project::confirm_candidate (obligation=0x7fffecab7710, obligation_trait_ref=<optimized out>, selcx=<optimized out>, candidate=...) at librustc/traits/project.rs:1161
#409 rustc::traits::project::project_type (selcx=<optimized out>, obligation=<optimized out>) at librustc/traits/project.rs:878
#410 0x00007ffff2f88038 in rustc::traits::project::opt_normalize_projection_type (selcx=0x7fffecabfbc0, param_env=..., projection_ty=..., cause=..., depth=<optimized out>) at librustc/traits/project.rs:541
--
#438 0x00007ffff2f44dbb in rustc::traits::select::SelectionContext::select (self=0x7fffecabfbc0, obligation=0x7fffeca9e488) at librustc/traits/select.rs:526
#439 0x00007ffff2f8c276 in rustc::traits::project::confirm_select_candidate (selcx=0x7fffecabfbc0, obligation=0x7fffecab9630, obligation_trait_ref=<optimized out>) at librustc/traits/project.rs:1174
#440 rustc::traits::project::confirm_candidate (obligation=0x7fffecab9630, obligation_trait_ref=<optimized out>, selcx=<optimized out>, candidate=...) at librustc/traits/project.rs:1161
#441 rustc::traits::project::project_type (selcx=<optimized out>, obligation=<optimized out>) at librustc/traits/project.rs:878
#442 0x00007ffff2f88038 in rustc::traits::project::opt_normalize_projection_type (selcx=0x7fffecabfbc0, param_env=..., projection_ty=..., cause=..., depth=<optimized out>) at librustc/traits/project.rs:541
--
#470 0x00007ffff2f44dbb in rustc::traits::select::SelectionContext::select (self=0x7fffecabfbc0, obligation=0x7fffeca9e488) at librustc/traits/select.rs:526
#471 0x00007ffff3607861 in rustc::traits::project::assemble_candidates_from_impls::{{closure}} () at librustc/traits/project.rs:1013
#472 rustc::infer::InferCtxt::probe (self=<optimized out>, f=...) at librustc/infer/mod.rs:905
#473 0x00007ffff2f8b99f in rustc::traits::project::assemble_candidates_from_impls (selcx=0x7fffecabfbc0, obligation=<optimized out>, obligation_trait_ref=<optimized out>, candidate_set=0x7fffecabb5d0) at librustc/traits/project.rs:1012
#474 rustc::traits::project::project_type (selcx=<optimized out>, obligation=<optimized out>) at librustc/traits/project.rs:817
--
#503 0x00007ffff2f44dbb in rustc::traits::select::SelectionContext::select (self=0x7fffecabfbc0, obligation=0x7fffeca9e488) at librustc/traits/select.rs:526
#504 0x00007ffff3607861 in rustc::traits::project::assemble_candidates_from_impls::{{closure}} () at librustc/traits/project.rs:1013
#505 rustc::infer::InferCtxt::probe (self=<optimized out>, f=...) at librustc/infer/mod.rs:905
#506 0x00007ffff2f8b99f in rustc::traits::project::assemble_candidates_from_impls (selcx=0x7fffecabfbc0, obligation=<optimized out>, obligation_trait_ref=<optimized out>, candidate_set=0x7fffecabd750) at librustc/traits/project.rs:1012
#507 rustc::traits::project::project_type (selcx=<optimized out>, obligation=<optimized out>) at librustc/traits/project.rs:817
The cause in this case seems to be that at every Chain
the rustc::traits::project::project_type
function calls rustc::traits::select::SelectionContext::select
twice: the first time from assemble_candidates_from_impls
via InferCtxt::probe
, the second time from rustc::traits::project::confirm_candidate
. Both calls recurse on the next level, causing the exponential blowup.
Thanks for the profiles everyone, btw, I've been spending a bit more time investigating too but don't yet have any insights to share. =)
@nikomatsakis can you share what problem have you stumbled over? Thanks to @tanriol's profile I think the root cause is quite clear; we just need to cache the call to eliminate the blowup. What I'm worrying is that SelectionContext::select
does have side effects, and as I don't have much knowledge on projection I don't know if the second (redundant) execution can be easily removed.
Yeah, so this is related to deeply nested types itself, and it doesn't matter where it appear. As argument, return type, or even just creating on stack triggers this issue. #39684 #43757 should be duplicates.
Finding as of today:
The blowup roots from confirm_impl_candidate
in select module (not project). Confirming candidate is a non-cachable operation. It then attempts to normalize the type (which doesn't seem to cache too) inside vtable_impl
and recurs to next level.
cc @tanriol @nikomatsakis Please check if this clears up some obstacles. Primarily I'm wondering why normalization is continuously performed (maybe due to snapshot?).
Okay, I think I've successfully understood the overall picture of this blowup.
First, it's true that we're calling SelectionContext::select
twice per iteration. This is what we probably need to reduce to fix the issue.
Second, the blowup occurs in operations that are not currently cached. In confirm_impl_candidate
, we create an snapshot and then calls vtable_impl
, then impl_or_trait_obligations
. It's the place we recur onto next level, and because we are in a snapshot, the unification results are completely discarded after the operation. This is the second cause making the operation non-cachable: unification relies on side effects so it doesn't execute twice, but we revert the changes so it's computing every time.
Adding to that, impl_or_trait_obligations
tries to normalize the inner type (?) for each predicate, which sounds like a multiplicative behavior here. However, this is ran without clearing the snapshot, which may make the situation better (it still sounds like an issue to me, though). Also, the reproduction example only used Iterator obligation which is the only one, thus the multiplicative behavior here hasn't been tested yet.
To sum up, we didn't get any short circuits here due to the two factors preventing caching from happening. Next, I'm proposing some solutions below, focusing on fixing the "calling twice" problem:
Try to preserve the snapshot for the impl candidate we resolve. We can reuse it later which eliminates the need of calling confirm_candidate the second time. However, as far I understood I'm not sure if snapshot is designed to be saved beyond scope (it always take a closure for pub APIs).
Do not call confirm_candidates
the first time, instead call it only when we're really confirming it. My concern is that this may breaks code that relies on side effects (unification) which I don't understand yet.
This is rather focused on solving the normalization cache, and the idea is quite abstract. We can try to cache normalizations, but I have no idea what we should take as a key.
I need your expertise on this, so we can reach a solution that resolves the issue. Please give me your opinion!
Another update: the projection cache actually exists, but we're discarding them every time by wiping the InferCtxt (with snapshots). I have created a diagram to show how this recurs:
So basically we need to implement this FIXME: https://github.com/rust-lang/rust/blob/554fe71ac6172ce9c0baeabbfd692280dcc893fe/src/librustc/traits/project.rs#L1548-L1550
@arielb1 You seem to have created this FIXME. Can you give me some hints on implementing cross-infcx cache?
I have launched a PoC in #48296 and I confirm that the exponential part should be resolved. I still observe some degree of polynomial time algorithms, which makes rustc chokes at about 50 chain()
.
@hcpl I suspect you copied the wrong code for the specialized version above, as I observe the same time-passes behavior and the code lacks main()
despite you specify --crate-type bin
for the specialized version.
Can you check again, and if possible, provide the correct specialized version code?
Update: although I have fixed the normalization part, the typeck still seems to be exponential on time, and very memory consuming. I'm investigating the cause of this second issue.
I have implemented a fix for the typeck part too, and I believe there's no more exponential algorithms in rustc anymore.
@ishitatsuyuki the code is correct, but both rustc
invocations should have been with --crate-type=lib src/lib.rs
(which were assumed to be in different crates). Sorry about that.
I've changed filenames to foo_generic.rs
and foo_specialized.rs
respectively to better reflect their purpose.
Most helpful comment
Another update: the projection cache actually exists, but we're discarding them every time by wiping the InferCtxt (with snapshots). I have created a diagram to show how this recurs:

So basically we need to implement this FIXME: https://github.com/rust-lang/rust/blob/554fe71ac6172ce9c0baeabbfd692280dcc893fe/src/librustc/traits/project.rs#L1548-L1550