Rust: Exponential compile-time and type_length_limit blowup when nesting closure wrappers

Created on 24 Sep 2018 · 24Comments · Source: rust-lang/rust

I think this is better explained by a code snippet:

#![recursion_limit="128"]
#![type_length_limit="67108864"]

fn main() {
    let s = S19::default();
    s.call_me((), &|()| ());
}

trait CallMe<T: Copy> {
    fn call_me<F: Fn(T)>(&self, t: T, f: &F);
}

impl<T: Copy> CallMe<T> for () {
    fn call_me<F: Fn(T)>(&self, _: T, _: &F) {}
}

#[derive(Default, Debug)]
struct S<T>(T, T);

impl<T: Copy, P: CallMe<T>> CallMe<T> for S<P> {
    fn call_me<F: Fn(T)>(&self, t: T, f: &F) {
        let wrapped = |t: _| {
            f(t);
        };
        self.0.call_me(t, &wrapped);
        self.1.call_me(t, &wrapped);
    }
}

type S0 = S<()>;
type S1 = S<S0>;
type S2 = S<S1>;
type S3 = S<S2>;
type S4 = S<S3>;
type S5 = S<S4>;
type S6 = S<S5>;
type S7 = S<S6>;
type S8 = S<S7>;
type S9 = S<S8>;
type S10 = S<S9>;
type S11 = S<S10>;
type S12 = S<S11>;
type S13 = S<S12>;
type S14 = S<S13>;
type S15 = S<S14>;
type S16 = S<S15>;
type S17 = S<S16>;
type S18 = S<S17>;
type S19 = S<S18>;

What it does is it creates a closure on each level, that wraps another received closure. With the number of levels the compile time and the needed type_length_limit grows exponentially.

However, if I change the two calls to:

self.0.call_me(t, &f);
self.1.call_me(t, &f);

then the blow-up goes away and it compiles almost instantaneously (I tried it up to 120 levels, without increasing the type length limit), so nesting the S structures like this is OK. But if I change only one of the calls (or even remove it) and leave just one call to the wrapper, the blowup is still there. However, there seems to be nothing around the wrapper that should make it exponential ‒ it's contains just one call to the wrapped f, has no captures, and if I look at the code, I still count only one distinct closure on each level.

The times I've measured with different number of levels:

11: 0.5
12: 0.74
13: 0.97
14: 1.56
15: 2.66
16: 5.02
17: 9.16
18: 18.50
19: 37.72

This is happening both on stable and nightly:

rustc 1.30.0-nightly (63c75d375 2018-09-21)
binary: rustc
commit-hash: 63c75d375d76d0bcc586f9cb5520ced24bc58450
commit-date: 2018-09-21
host: x86_64-unknown-linux-gnu
release: 1.30.0-nightly
LLVM version: 8.0

@ishitatsuyuki mentioned at IRC that he would want to have a look.

A-closures I-compiletime P-medium T-compiler

Source

vorner

👍16

Most helpful comment

i'm also seeing this issue using closures with futures where .and_then(|r| r.do() ) is reasonably unavoidable :-/

are there any tricks that might avoid the issue in this case, or even, any useful way of tracking down _where_ the the compiler is struggling in a large async application?

ryankurte on 22 May 2019

👍4

All 24 comments

And another thing I've noticed. cargo check does not trigger the limits and is fast no matter how many levels I put there (but I guess these limits are for the phases cargo check doesn't do).

vorner on 24 Sep 2018

cc #54175 @nikomatsakis

eddyb on 24 Sep 2018

So well, type_length_limit is a monomorphization time thing and it's used for symbol name (lengths). It's no wonder that the blowup happened because you actually raised the type_length_limit.

As type aliases are identical to the original type, we can't generate things like debug symbols of a reasonable length in this case. We have to combine both caching and digesting to make the symbol shorter.

ishitatsuyuki on 25 Sep 2018

How is it possible then that if I don't call the wrapper, but f, no limit is triggered? It still needs to create the symbol names for the call_me method for each S with a very long name. What I'm trying to point out here is, it all works as a charm without the wrapper closure, but that closure doesn't bring anything special to the table, so why does it make a difference?

And I still don't think the compiler would take 30 seconds to create 20 or so symbol names, each 6 MB long.

What I'm trying to say ‒ I understand than if the bigger limit is needed, it takes longer. But I think the bigger limit should not be needed here. Or, at least, I don't see why it should be.

vorner on 25 Sep 2018

👍1

@vorner We don't put types in symbol names except for the X in <X as Trait>::method, so I think that's likely what the closure is doing here.

~~If we move to proper C++-like mangling, we can use the compression features and bypass this problem entirely.~~

EDIT: I likely was wrong, see below.

eddyb on 26 Sep 2018

Triage, Pending Pre-RFC that aims to solve this: https://internals.rust-lang.org/t/pre-rfc-a-new-symbol-mangling-scheme

ishitatsuyuki on 5 Nov 2018

After working on the parts of rustc relevant to this issue, I'm pretty sure none of those types are getting into symbol names, but rather this is type_length_limit's walk over the generic arguments:
https://github.com/rust-lang/rust/blob/72bc62047f7fd9219bcd884d399c98463f2978ae/src/librustc_mir/monomorphize/collector.rs#L470

The reason closures trigger this is because F is both a type parameter of the closure and a capture, so the walk above doubles itself at every level (and it's breadth-first, so I assume its memory allocations are responsible for the slowdown).

We can trigger the type_length_limit behavior for closures with (F, F), too, and so I managed to reproduce the slowdown without closures (or really any calls):

(click to open)

#![type_length_limit="8388607"]

pub const MAIN: fn() = <S21 as Exp<()>>::exp as fn();

trait Exp<X> {
    fn exp();
}

impl<X> Exp<X> for () {
    fn exp() {}
}

impl<T: Exp<(X, X)>, X> Exp<X> for (T,) {
    fn exp() {
        <T as Exp<(X, X)>>::exp as fn();
    }
}

type S<T> = (T,);
type S0 = S<()>;
type S1 = S<S0>;
type S2 = S<S1>;
type S3 = S<S2>;
type S4 = S<S3>;
type S5 = S<S4>;
type S6 = S<S5>;
type S7 = S<S6>;
type S8 = S<S7>;
type S9 = S<S8>;
type S10 = S<S9>;
type S11 = S<S10>;
type S12 = S<S11>;
type S13 = S<S12>;
type S14 = S<S13>;
type S15 = S<S14>;
type S16 = S<S15>;
type S17 = S<S16>;
type S18 = S<S17>;
type S19 = S<S18>;
type S20 = S<S19>;
type S21 = S<S20>;

Note that a downside of using (X, X) instead of a closure is that tuples are actually exponential in debuginfo, unlike closures (which only show their captures), so you either need to compile unoptimized w/o debuginfo, or in release mode (e.g. on the playground - it'd be nice if it also had the former, cc @lqd).

What can you (@vorner) do?

Not much, I'm afraid you can't avoid this issue, if you need a closure with a capture.
However, if you don't need the Fn() trait (or at least not for the code that contains the wrapper closure), you can build your own Fn-like trait, and instead of a closure you can have a struct which is only parameterized by F once (unlike the closure, which ends up doing it twice).
Or you can wait for one of the fixes below (but the issue will, of course, persist in older versions).

What can we do?

For type_length_limit specifically, it could control the walk more directly, and cache the sizes of subtrees, which should reduce the cost for this to something linear.
I am, however, worried this may be inefficient for most "type lengths", which are probably going to be relatively small anyway, and where the current walk is likely pretty cheap.
IIRC type_length_limit was added to avoid the compiler spending a lot of time without appearing to do anything, I wonder if that could be solved some other way?

For closure captures, maybe they should refer to their parameters abstractly, so we never have to substitute captures (until we need to e.g. compute layout)? Or even keep captures outside of the type itself (had we done this and found it inefficient?)
cc @rust-lang/wg-traits @rust-lang/wg-compiler-nll @arielb1

I'm nominating this issue for discussion at the next compiler team meeting, wrt the last two points above (type_length_limit and closure captures).

eddyb on 21 Apr 2019

triage: tagging as P-medium, under assumption that the exponential time blowup here is only observable if the user "opts in" via type_length_limit with large values. Leaving nominated for discussion since, well, it still needs to be discussed.

If the prior assumption is invalid, feel free to challenge that priority assignment.

pnkfelix on 2 May 2019

assigning to self to try to resolve this, hopefully with @eddyb input. Un-nominating.

pnkfelix on 16 May 2019

👍2

i'm also seeing this issue using closures with futures where .and_then(|r| r.do() ) is reasonably unavoidable :-/

are there any tricks that might avoid the issue in this case, or even, any useful way of tracking down _where_ the the compiler is struggling in a large async application?

ryankurte on 22 May 2019

👍4

@ryankurte There's not much you can do, the compiler is most likely struggling to compute the unnecessarily large type_length_limit number (not doing anything useful), and until that's fixed:

if you can, I'd highly suggest moving to async fn - that should be a net benefit
you can insert .boxed() calls strategically to avoid creating monstrously large future types

eddyb on 22 May 2019

thanks for the boxed hint, i can compile (slowly) again now ^_^
last i tried async fn the interop was broken and there’s no way i can swap everything all at once, will have another look when i have a moment.

ryankurte on 23 May 2019

i'm also seeing this issue using closures with futures where .and_then(|r| r.do() ) is reasonably unavoidable :-/

are there any tricks that might avoid the issue in this case, or even, any useful way of tracking down _where_ the the compiler is struggling in a large async application?

I ended up Box'ing most futures instead of boxed() since it's deprecated. Compiling time increased. How would async syntax help here?

gakonst on 12 Aug 2019

How would async syntax help here?

async fn would simply not have all of these deep types (they come from the closures passed to future combinators).

eddyb on 12 Aug 2019

👍2

We experienced a similar issue https://github.com/ZcashFoundation/zebra/issues/923 in recent nightlies, because we were using nested tracing::instrument spans (see https://github.com/tokio-rs/tracing/issues/616).

Using nightly-2020-08-13 or earlier resolves the issue. (Removing the nested spans also resolves the issue.)

Here is the regression report for rustc:

searched nightlies: from nightly-2020-08-12 to nightly-2020-08-15
regressed nightly: nightly-2020-08-14
searched commits: from https://github.com/rust-lang/rust/commit/576d27c5a6c80cd39ef57d7398831d8e177573cc to https://github.com/rust-lang/rust/commit/81dc88f88f92ba8ad7465f9cba10c12d3a7b70f1
regressed commit: https://github.com/rust-lang/rust/commit/0a49057dd35d9bd2fcc9760a054809c30eee2a58

bisected with cargo-bisect-rustc v0.5.2

Host triple: x86_64-unknown-linux-gnu
Reproduce with:

cargo bisect-rustc --start=2020-08-12 --end=2020-08-15

teor2345 on 20 Aug 2020

👍3

In Quinn we similarly observe that #75443 has triggered the requirement to set a type-lengh limit in one of our crates for the first time (like zebra, we use tracing in that crate), tracked in https://github.com/djc/quinn/issues/839.

djc on 23 Aug 2020

Note that quinn does not use tracing's proc macros; the offending code is a fairly simple use of a future combinator:

rust let results = future::join_all(peers.into_iter().map(|peer| async move { let span = info_span!("peer", name = peer.name.as_str()); let result = run(&peer, keylog).instrument(span).await; (peer, result) }));

Ralith on 23 Aug 2020

Sounds like https://github.com/rust-lang/rust/pull/72412 might help here

jonas-schievink on 29 Aug 2020

👍1

To repeat a comment I made on #64496:

@eddyb has said we should just get rid of this check. @nikomatsakis I'm curious to hear your thoughts on that. Personally I'm not sure; it seems like this can point to areas where compile time etc. would be improved by introducing a boxed future, but it doesn't necessarily do a _good_ job of that.

For reference, here's a change to raise all the limits we needed to in Fuchsia. Some of the limits are quite high (the highest I see is over 18 million). I will say that all of these are in related parts of the code, which tells me there might be a common denominator. I'm not aware of an easy way of finding it if there is, though. And people working in that part of the code will be left with juggling these arbitrary-seeming limits.

The compile fails immediately on hitting one of these errors, and the offending type may not be the biggest one in the compile, creating the really unfortunate experience of updating the limit only to have it fail again. At one point I just started increasing it myself by arbitrary amounts over what the compiler suggested.

tmandry on 9 Sep 2020

The compile-time aspect of this was fixed in #72412.

tmandry on 19 Sep 2020

❤1

(And the type length limit checks are being worked on in #76772.)

djc on 19 Sep 2020

🎉1

In case of deltachat the type_length_limit is gone on nightly: https://github.com/deltachat/deltachat-core-rust/issues/1873
I assume #72412 reduced the lengths too?

Why didn't crater catch this bug, does this mean type_length_limit wasn't triggered for any crates published on crates.io? Would publishing on crates.io have helped to detect the bug earlier, before 1.46.0 release?

link2xt on 19 Sep 2020

Ah sorry for the confusion, we only walk each unique type once when measuring the type length now, so it will have an effect on that too.

tmandry on 20 Sep 2020

Confirmed that current nightly doesn't require the type length limit increase for Quinn anymore.

djc on 21 Sep 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Lifetime elision is too greedy without explicit type declaration

dnsl48 · 3Comments

Internal compiler error: cannot relate bound region (likely caused by `conservative_impl_trait`)

jmegaffin · 3Comments

detect when "unconstrained type parameters" could be provided explicitly to a fn call

nikomatsakis · 3Comments

Comments for macro expansions

Robbepop · 3Comments

Documentation: Unclear difference between "string slices" and "string literals" for newcomers

defuz · 3Comments