Rust: NLL performance tracking issue

Created on 30 Jan 2018 · 7Comments · Source: rust-lang/rust

This is a tracking issue dedicated to discussing ideas and techniques for improving compile-time with the NLL feature. It's meant to house as a repository of measurements, benchmarks, etc for the time being to help coordinate efforts.

Benchmarks and timing runs

We can view performance results on perf.rust-lang.org now. Just look for the "NLL" runs. They can be compared against "clean" runs -- the delta is the 'extra work' we are doing when NLL is enabled (note that NLL still runs the old region analysis and borrow check, so it does strictly more work).

Ideas for improvement or measurement

[x] Introduce a dirty list (https://github.com/rust-lang/rust/pull/47766)
[ ] More detailed profiling and analysis of the results above
[ ] Measure: what percentage of the time do calls to dfs calls wind up actually adding new info?
[ ] Idea: modify dfs code not to invoke successors(), which can allocate, but instead add some kind of non-allocating form (perhaps with a callback?)
[x] Experiment with sparse representation of values (https://github.com/rust-lang/rust/issues/48170)
[ ] ... others? Discuss below, that's what this issue is for =)

Quick pointers into the source

The main source of time for NLL is probably going to be the code that propagates constraints:

https://github.com/rust-lang/rust/blob/fe7e1a45f37f4265434cead827f587e75412f85c/src/librustc_mir/borrow_check/nll/region_infer/mod.rs#L453-L457

And in particular the calls to dfs:

https://github.com/rust-lang/rust/blob/fe7e1a45f37f4265434cead827f587e75412f85c/src/librustc_mir/borrow_check/nll/region_infer/mod.rs#L486-L495

The dfs code must walk the graph to find which points should be added and where:

https://github.com/rust-lang/rust/blob/fe7e1a45f37f4265434cead827f587e75412f85c/src/librustc_mir/borrow_check/nll/region_infer/dfs.rs#L37-L40

cc @rust-lang/wg-compiler-nll

C-tracking-issue I-compiletime NLL-performant T-compiler

Source

nikomatsakis

Most helpful comment

Good news:

The indomitable @Mark-Simulacrum has added a special "NLL mode" to http://perf.rust-lang.org/, so we can now visualize our performance very easily. The NLL point should be compared to "clean" -- the delta is the 'extra work' we are doing when NLL is enabled (note that NLL still runs the old region analysis and borrow check, so it does strictly more work).

Bad news:

We got out work cut out for us! =) See that little orange triangle in the upper right?

screen shot 2018-03-23 at 1 54 21 pm

nikomatsakis on 23 Mar 2018

👍4 😄1

All 7 comments

I did some profiling of syn. The top result was actually not what I expected:

+   14.45%    14.45%  rustc    librustc-1dc9e414ba4b7dab.so                  [.] rustc::infer::region_constraints::RegionConstraintCollector::take_and_reset_data

This is type-checker constraint collection; this may imply that some of the refactorings I had in mind for doing this in a smarter way are in order.

nikomatsakis on 30 Jan 2018

OK, that profiling run was kind of messed up, because I forgot to include debuginfo = 1, so that the framepointers were all wrong etc. Still, when I did a better run, the data was a lot more detailed, but pointed roughly to the same conclusion. The NLL type checker is killing a lot of time doing normalization. This is also responsible for some soundness bugs. This kind of ups the importance of doing the trait refactorings I had in mind to handle this scenario better, but I'd hate to block on this (they are kind of involved). I have to think if we can do them in a kind of lightweight way.

nikomatsakis on 30 Jan 2018

On a hunch, tried the results of @Aaron1011's https://github.com/rust-lang/rust/pull/47920. This makes a big difference, bringing us down to 7s. Haven't looked at the detailed profiling yet.

nikomatsakis on 1 Feb 2018

Did some rough profiling and analysis. For syn, with #47920 applied:

mir borrowck accounts for 18% of total runtime
- nll accounts for 16% of total run time
- running the MIR type check accounts for 14% of total run time
  - add_drop_live_constraint accounts for 12% of total run time
- the other 2% is not concentrated in any one place
  - rustc_mir::borrow_check::nll::region_infer::RegionInferenceContext::solve in particular is 0% (7 samples out of 1034)
for comparison, AST-based borrowck accounts for 2%

nikomatsakis on 2 Feb 2018

So after https://github.com/rust-lang/rust/pull/48411 lands, this is the current profile of the syn crate:

lunch-box. perf focus  '{do_mir_borrowck}' --tree-callees --tree-min-percent 2
Matcher    : {do_mir_borrowck}
Matches    : 152
Not Matches: 776
Percentage : 16%

Tree
| matched `{do_mir_borrowck}` (16% total, 0% self)
: | rustc_mir::borrow_check::nll::compute_regions (12% total, 0% self)
: : | rustc_mir::borrow_check::nll::type_check::type_check (10% total, 0% self)
: : : | rustc_mir::borrow_check::nll::type_check::type_check_internal (10% total, 0% self)
: : : : | rustc_mir::borrow_check::nll::type_check::type_check::_$u7b$$u7b$closure$u7d$$u7d$::haa6458083ae8ea7a (7% total, 0% self)
: : : : : | rustc_mir::borrow_check::nll::type_check::liveness::generate (7% total, 0% self)
: : : : : : | rustc_mir::borrow_check::nll::type_check::TypeChecker::fully_perform_op (5% total, 0% self)
: : : : : : : | rustc::infer::InferCtxt::take_and_reset_region_constraints (2% total, 0% self)
: : : : : : : : | rustc::infer::region_constraints::RegionConstraintCollector::take_and_reset_data (2% total, 0% self)
: : : : : : : : : | <rustc_data_structures::unify::UnificationTable<K>>::new_key (2% total, 0% self)
: : : : : : : | rustc::infer::InferCtxt::commit_if_ok (2% total, 0% self)
: : : : : : : : | rustc::traits::query::dropck_outlives::<impl rustc::infer::at::At<'cx, 'gcx, 'tcx>>::dropck_outlives (2% total, 0% self)

This is showing stuff that takes more than 2% of total execution time.

Here is the profile showing stuff that takes more than 1%:

1% profile

lunch-box. perf focus  '{do_mir_borrowck}' --tree-callees --tree-min-percent 1
Matcher    : {do_mir_borrowck}
Matches    : 152
Not Matches: 776
Percentage : 16%

Tree
| matched `{do_mir_borrowck}` (16% total, 0% self)
: | rustc_mir::borrow_check::nll::compute_regions (12% total, 0% self)
: : | rustc_mir::borrow_check::nll::type_check::type_check (10% total, 0% self)
: : : | rustc_mir::borrow_check::nll::type_check::type_check_internal (10% total, 0% self)
: : : : | rustc_mir::borrow_check::nll::type_check::type_check::_$u7b$$u7b$closure$u7d$$u7d$::haa6458083ae8ea7a (7% total, 0% self)
: : : : : | rustc_mir::borrow_check::nll::type_check::liveness::generate (7% total, 0% self)
: : : : : : | rustc_mir::borrow_check::nll::type_check::TypeChecker::fully_perform_op (5% total, 0% self)
: : : : : : : | rustc::infer::InferCtxt::take_and_reset_region_constraints (2% total, 0% self)
: : : : : : : : | rustc::infer::region_constraints::RegionConstraintCollector::take_and_reset_data (2% total, 0% self)
: : : : : : : : : | <rustc_data_structures::unify::UnificationTable<K>>::new_key (2% total, 0% self)
: : : : : : : : : : | <rustc_data_structures::snapshot_vec::SnapshotVec<D>>::push (1% total, 1% self)
: : : : : : : | rustc::infer::InferCtxt::commit_if_ok (2% total, 0% self)
: : : : : : : : | rustc::traits::query::dropck_outlives::<impl rustc::infer::at::At<'cx, 'gcx, 'tcx>>::dropck_outlives (2% total, 0% self)
: : : : : : : : : | rustc::infer::canonical::<impl rustc::infer::InferCtxt<'cx, 'gcx, 'tcx>>::instantiate_query_result (1% total, 0% self)
: : : : : : | <rustc_mir::dataflow::at_location::FlowAtLocation<BD>>::each_state_bit (1% total, 1% self)
: : : : | <rustc_mir::borrow_check::nll::type_check::TypeVerifier<'a, 'b, 'gcx, 'tcx> as rustc::mir::visit::Visitor<'tcx>>::visit_mir (1% total, 0% self)
: : | rustc_mir::borrow_check::nll::region_infer::RegionInferenceContext::solve (1% total, 0% self)

nikomatsakis on 22 Feb 2018

Good news:

Bad news:

We got out work cut out for us! =) See that little orange triangle in the upper right?

screen shot 2018-03-23 at 1 54 21 pm

nikomatsakis on 23 Mar 2018

👍4 😄1

I'm gonna close this tracking issue. Not that the problem is solved, but I don't see this issue as adding a lot of value.

nikomatsakis on 6 Jun 2018

Was this page helpful?

0 / 5 - 0 ratings

Related issues

detect when "unconstrained type parameters" could be provided explicitly to a fn call

nikomatsakis · 3Comments

Grenade Stack Becomes unusable

modsec · 3Comments

Use #[repr(C)] HList's to infer type-erased fmt fn pointers in format_args!'s static data.

eddyb · 3Comments

Tracking issue for the `iterator_for_each` library feature

cuviper · 3Comments

ThreadRng performance bug

SharplEr · 3Comments