Rust: Match checking has quadratic average complexity

Created on 28 Jun 2013 · 19Comments · Source: rust-lang/rust

If you look at the trans-stats for some of our modules, some "big functions" really stand out:

rustc:

16838 insns, 67 ms, middle::trans::foreign::trans_intrinsic
15988 insns, 145 ms, middle::typeck::check::check_expr_with_unifier
11759 insns, 87 ms, middle::privacy::check_crate

libextra: 

5244 insns, 31 ms, terminfo::parm::expand
5073 insns, 21 ms, time::do_strptime::parse_type
4068 insns, 23 ms, time::do_strftime::parse_type
3610 insns, 73 ms, terminfo::parser::compiled::parse
2169 insns, 89 ms, net_tcp::listen_common
1952 insns, 28 ms, getopts::getopts
1937 insns, 193 ms, net_tcp::connect

If you look at the code generating them, they don't immediately look like they should be taking 10x as long as other functions; but they do all seem to use match somewhat heavily. This suggests to me that we're still compiling matches in some way that causes a combinatorial explosion of basic blocks, or something.

Investigate!

A-patterns C-enhancement I-compiletime P-medium T-compiler

Source

graydon

Most helpful comment

I'd like to make it clear that match checking is a NP-Complete problem, and quadratic average time is not very bad for such complex algorithm.

However, it's true that it's taking up significant portion of compile time, and we probably want to special case trivial (C/switch-like) matches so that they use a loglinear algorithm instead.

ishitatsuyuki on 5 Nov 2018

👍4

All 19 comments

/cc @toddaaro

emberian on 13 Aug 2013

/cc @msullivan
I don't think it was actually toddaaro I was talking to about this...

emberian on 13 Aug 2013

I have whined a lot about match implementations that suffer from combinatorial explosion, so maybe that is why you thought of me.

This Scala issue is probably unrelated, but maybe reading about the experience will prove useful to a Rust developer someday.

https://issues.scala-lang.org/browse/SI-1133

toddaaro on 21 Aug 2013

Triage: did anyone determine if match was still the issue here? I'd imagine codegen has gotten better since 2013...

steveklabnik on 20 Jan 2015

@steveklabnik I don't know of any changes to match codegen that would have alleviated this, re-running the measurements...

emberian on 20 Jan 2015

Yes, match is still generating more code than anything else, by a significant margin.

emberian on 20 Jan 2015

@cmr if you have a moment, can you explain how you checked this out? Future triage-rs will be thankful :smile:

steveklabnik on 20 Jan 2015

make RUSTFLAGS="-Z count-llvm-insns" | tee output, followed by sort -n < output > output2, then look at the bottom of output2 (ctrl-f for "trans_match")

emberian on 20 Jan 2015

trans-ing a match appears to be quadratic in the number of arms (at least for integers and strings):

$ for N in 500 1000 2000 4000; do
    echo $N
    python -c "print('fn main() { match 0 { %s _ => {} } }' % ''.join(' %s => {}' % n for n in range(0, $N)))" | 
        rustc - -Z time-passes |
        grep translation
  done
500
time: 0.030; rss: 65MB  translation
1000
time: 0.107; rss: 67MB  translation
2000
time: 0.427; rss: 70MB  translation
4000
time: 1.620; rss: 75MB  translation

(The python invocation prints code like fn main() { match 0 { 0 => {} 1 => {} 2 => {} 3 => {} 4 => {} _ => {} } } with $N + 1 arms.)

huonw on 5 Jan 2016

Results of @huonw's script above with rustc 1.11.0-nightly (34505e222 2016-06-08):

time: 0.088; rss: 97MB  translation
1000
time: 0.308; rss: 100MB translation
2000
time: 1.247; rss: 101MB translation
4000
time: 4.723; rss: 106MB translation

With -Z orbit:

time: 0.009; rss: 95MB  translation
1000
time: 0.015; rss: 97MB  translation
2000
time: 0.017; rss: 101MB translation
4000
time: 0.030; rss: 104MB translation

MIR trans looks to be a lot better at this case.

brson on 10 Jun 2016

@brson

MIR trans's algorithm is indeed very careful to generate a linear number of basic blocks.

arielb1 on 11 Jun 2016

Match checking currently takes the most time, which is rather unsurprising since I think it's O(n^2) code. So I'm going to leave this open for the time being, though realistically matches with this many arms are... excessive, but I plan on taking a look at the code and seeing if it's possible to optimize it a little soon, so I'll leave it open till then.

$ for N in {10..100}; do echo -n "$((2**$N)): "; python -c "print('fn main() { match 0 { %s _ => {} } }' % ''.join(' %s => {}' % n for n in xrange(0, 2 ** $N)))" | rustc - -Z time-passes | rg --color never 'match'; done
1024: time: 0.021; rss: 93MB    match checking
2048: time: 0.083; rss: 96MB    match checking
4096: time: 0.335; rss: 99MB    match checking
8192: time: 1.340; rss: 107MB   match checking
16384: time: 5.360; rss: 117MB  match checking
32768: time: 21.446; rss: 142MB match checking
65536: time: 88.756; rss: 193MB match checking
131072: time: 380.206; rss: 300MB   match checking

Mark-Simulacrum on 5 May 2017

Actually, the current match checking code seems to be wrongly/poorly implemented. The original paper had linear performance on simple sequences here.

EDIT: I interpret the figures wrongly:

the X-axis shows the size of matchings squared, (expressed as source file size squared)

ishitatsuyuki on 22 Apr 2018

I'd like to make it clear that match checking is a NP-Complete problem, and quadratic average time is not very bad for such complex algorithm.

However, it's true that it's taking up significant portion of compile time, and we probably want to special case trivial (C/switch-like) matches so that they use a loglinear algorithm instead.

ishitatsuyuki on 5 Nov 2018

👍4

Triage:

~$ for N in 500 1000 2000 4000; do     echo $N;     python -c "print('fn main() { match 0 { %s _ => {} } }' % ''.join(' %s => {}' % n for n in range(0, $N)))" |          rustc +devel - -Z time-passes | grep "checking 2"; done
500
  time: 0.028   misc checking 2
1000
  time: 0.087   misc checking 2
2000
  time: 0.312   misc checking 2
4000
  time: 1.220   misc checking 2

~$ for N in 500 1000 2000 4000; do     echo $N;     python -c "print('fn main() { match 0 { %s _ => {} } }' % ''.join(' %s => {}' % n for n in range(0, $N)))" |          rustc +devel - -Z time-passes | grep "match"; done
500
    time: 0.029 rvalue promotion + match checking
1000
    time: 0.087 rvalue promotion + match checking
2000
    time: 0.313 rvalue promotion + match checking
4000
    time: 1.233 rvalue promotion + match checking

estebank on 9 May 2019

triage: P-medium.

This is not believed to be a high priority work-item; in practice these match checking times, even for large matches, are minuscule compared to other passes in the compiler.

Still, it might be a fun thing for a volunteer to attack.

pnkfelix on 27 Feb 2020

However, it's true that it's taking up significant portion of compile time, and we probably want to special case trivial (C/switch-like) matches so that they use a loglinear algorithm instead.

(Even this simpler step would probably be both easy and beneficial!)

pnkfelix on 27 Feb 2020

It's 2020 and

$ for K in {8..16}; do
  N=$((1 << K))
  echo -n "$((N)): "
  python -c "print('fn main() { match 0 { %s _ => {} } }' % ''.join(' %s => {}' % n for n in xrange(0, $N)))" | 
    rustc - -Z time-passes | 
    rg --color never 'match'
done
256: time: 0.001; rss: 88MB match_checking
512: time: 0.004; rss: 89MB match_checking
1024: time: 0.013; rss: 89MB    match_checking # huh, rss is almost constant up to here
2048: time: 0.046; rss: 93MB    match_checking
4096: time: 0.179; rss: 96MB    match_checking
8192: time: 0.715; rss: 103MB   match_checking
16384: time: 3.152; rss: 118MB  match_checking
32768: time: 11.660; rss: 148MB match_checking
65536: time: 55.046; rss: 206MB match_checking

workingjubilee on 5 Sep 2020

The match exhaustiveness algorithm is indeed completely quadratic: for every branch, we check if it is redundant with any of the branches above. We could hack workarounds for some special cases like what #76918 did, but those usually stop being useful for non-trivial patterns, and I'm not sure they're worth it since most matches have only a few branches. (I would love some data to confirm that, if you want to help see https://github.com/rust-lang/rustc-perf/issues/792)
A real solution would involve a pretty drastic redesign of the algorithm. I have been thinking about that in my spare time, but as @ishitatsuyuki mentioned, it's not easy.