This is a tracking issue dedicated to discussing ideas and techniques for improving compile-time with the NLL feature. It's meant to house as a repository of measurements, benchmarks, etc for the time being to help coordinate efforts.
We can view performance results on perf.rust-lang.org now. Just look for the "NLL" runs. They can be compared against "clean" runs -- the delta is the 'extra work' we are doing when NLL is enabled (note that NLL still runs the old region analysis and borrow check, so it does strictly more work).
dfs calls wind up actually adding new info?dfs code not to invoke successors(), which can allocate, but instead add some kind of non-allocating form (perhaps with a callback?)The main source of time for NLL is probably going to be the code that propagates constraints:
And in particular the calls to dfs:
The dfs code must walk the graph to find which points should be added and where:
cc @rust-lang/wg-compiler-nll
I did some profiling of syn. The top result was actually not what I expected:
+ 14.45% 14.45% rustc librustc-1dc9e414ba4b7dab.so [.] rustc::infer::region_constraints::RegionConstraintCollector::take_and_reset_data
This is type-checker constraint collection; this may imply that some of the refactorings I had in mind for doing this in a smarter way are in order.
OK, that profiling run was kind of messed up, because I forgot to include debuginfo = 1, so that the framepointers were all wrong etc. Still, when I did a better run, the data was a lot more detailed, but pointed roughly to the same conclusion. The NLL type checker is killing a lot of time doing normalization. This is also responsible for some soundness bugs. This kind of ups the importance of doing the trait refactorings I had in mind to handle this scenario better, but I'd hate to block on this (they are kind of involved). I have to think if we can do them in a kind of lightweight way.
On a hunch, tried the results of @Aaron1011's https://github.com/rust-lang/rust/pull/47920. This makes a big difference, bringing us down to 7s. Haven't looked at the detailed profiling yet.
Did some rough profiling and analysis. For syn, with #47920 applied:
add_drop_live_constraint accounts for 12% of total run timerustc_mir::borrow_check::nll::region_infer::RegionInferenceContext::solve in particular is 0% (7 samples out of 1034)So after https://github.com/rust-lang/rust/pull/48411 lands, this is the current profile of the syn crate:
lunch-box. perf focus '{do_mir_borrowck}' --tree-callees --tree-min-percent 2
Matcher : {do_mir_borrowck}
Matches : 152
Not Matches: 776
Percentage : 16%
Tree
| matched `{do_mir_borrowck}` (16% total, 0% self)
: | rustc_mir::borrow_check::nll::compute_regions (12% total, 0% self)
: : | rustc_mir::borrow_check::nll::type_check::type_check (10% total, 0% self)
: : : | rustc_mir::borrow_check::nll::type_check::type_check_internal (10% total, 0% self)
: : : : | rustc_mir::borrow_check::nll::type_check::type_check::_$u7b$$u7b$closure$u7d$$u7d$::haa6458083ae8ea7a (7% total, 0% self)
: : : : : | rustc_mir::borrow_check::nll::type_check::liveness::generate (7% total, 0% self)
: : : : : : | rustc_mir::borrow_check::nll::type_check::TypeChecker::fully_perform_op (5% total, 0% self)
: : : : : : : | rustc::infer::InferCtxt::take_and_reset_region_constraints (2% total, 0% self)
: : : : : : : : | rustc::infer::region_constraints::RegionConstraintCollector::take_and_reset_data (2% total, 0% self)
: : : : : : : : : | <rustc_data_structures::unify::UnificationTable<K>>::new_key (2% total, 0% self)
: : : : : : : | rustc::infer::InferCtxt::commit_if_ok (2% total, 0% self)
: : : : : : : : | rustc::traits::query::dropck_outlives::<impl rustc::infer::at::At<'cx, 'gcx, 'tcx>>::dropck_outlives (2% total, 0% self)
This is showing stuff that takes more than 2% of total execution time.
Here is the profile showing stuff that takes more than 1%:
1% profile
lunch-box. perf focus '{do_mir_borrowck}' --tree-callees --tree-min-percent 1
Matcher : {do_mir_borrowck}
Matches : 152
Not Matches: 776
Percentage : 16%
Tree
| matched `{do_mir_borrowck}` (16% total, 0% self)
: | rustc_mir::borrow_check::nll::compute_regions (12% total, 0% self)
: : | rustc_mir::borrow_check::nll::type_check::type_check (10% total, 0% self)
: : : | rustc_mir::borrow_check::nll::type_check::type_check_internal (10% total, 0% self)
: : : : | rustc_mir::borrow_check::nll::type_check::type_check::_$u7b$$u7b$closure$u7d$$u7d$::haa6458083ae8ea7a (7% total, 0% self)
: : : : : | rustc_mir::borrow_check::nll::type_check::liveness::generate (7% total, 0% self)
: : : : : : | rustc_mir::borrow_check::nll::type_check::TypeChecker::fully_perform_op (5% total, 0% self)
: : : : : : : | rustc::infer::InferCtxt::take_and_reset_region_constraints (2% total, 0% self)
: : : : : : : : | rustc::infer::region_constraints::RegionConstraintCollector::take_and_reset_data (2% total, 0% self)
: : : : : : : : : | <rustc_data_structures::unify::UnificationTable<K>>::new_key (2% total, 0% self)
: : : : : : : : : : | <rustc_data_structures::snapshot_vec::SnapshotVec<D>>::push (1% total, 1% self)
: : : : : : : | rustc::infer::InferCtxt::commit_if_ok (2% total, 0% self)
: : : : : : : : | rustc::traits::query::dropck_outlives::<impl rustc::infer::at::At<'cx, 'gcx, 'tcx>>::dropck_outlives (2% total, 0% self)
: : : : : : : : : | rustc::infer::canonical::<impl rustc::infer::InferCtxt<'cx, 'gcx, 'tcx>>::instantiate_query_result (1% total, 0% self)
: : : : : : | <rustc_mir::dataflow::at_location::FlowAtLocation<BD>>::each_state_bit (1% total, 1% self)
: : : : | <rustc_mir::borrow_check::nll::type_check::TypeVerifier<'a, 'b, 'gcx, 'tcx> as rustc::mir::visit::Visitor<'tcx>>::visit_mir (1% total, 0% self)
: : | rustc_mir::borrow_check::nll::region_infer::RegionInferenceContext::solve (1% total, 0% self)
Good news:
The indomitable @Mark-Simulacrum has added a special "NLL mode" to http://perf.rust-lang.org/, so we can now visualize our performance very easily. The NLL point should be compared to "clean" -- the delta is the 'extra work' we are doing when NLL is enabled (note that NLL still runs the old region analysis and borrow check, so it does strictly more work).
Bad news:
We got out work cut out for us! =) See that little orange triangle in the upper right?

I'm gonna close this tracking issue. Not that the problem is solved, but I don't see this issue as adding a lot of value.
Most helpful comment
Good news:
The indomitable @Mark-Simulacrum has added a special "NLL mode" to http://perf.rust-lang.org/, so we can now visualize our performance very easily. The NLL point should be compared to "clean" -- the delta is the 'extra work' we are doing when NLL is enabled (note that NLL still runs the old region analysis and borrow check, so it does strictly more work).
Bad news:
We got out work cut out for us! =) See that little orange triangle in the upper right?