A local build of Servo's style crate shows a huge amount of time (46 seconds!) in name resolution. It turns out that basically all of it is calling span_to_snippet, specifically this one. That's a lot of unused imports!
Sure enough if we create a synthetic file locally with this script:
N=20000
echo '#![allow(unused)]' > lots-of-unused.rs
for i in `seq 1 $N`; do
echo "use foo as foo$i;" >> lots-of-unused.rs
done
echo 'fn foo() {}' >> lots-of-unused.rs
echo 'fn main() {}' >> lots-of-unused.rs
echo > lots-of-used.rs
for i in `seq 1 $N`; do
echo "use test::foo$i;" >> lots-of-used.rs
done
echo 'mod test {' >> lots-of-used.rs
for i in `seq 1 $N`; do
echo "pub fn foo$i() {}" >> lots-of-used.rs
done
echo '}' >> lots-of-used.rs
echo 'fn main() {' >> lots-of-used.rs
for i in `seq 1 $N`; do
echo "foo$i();" >> lots-of-used.rs
done
echo '}' >> lots-of-used.rs
we find:
$ rustc +nightly -V
rustc 1.21.0-nightly (aac223f4f 2017-07-30)
$ sh foo.sh
$ rustc +nightly lots-of-used.rs -Z time-passes 2>&1 | grep 'name resolution'
time: 0.033; rss: 164MB name resolution
$ rustc +nightly lots-of-unused.rs -Z time-passes 2>&1 | grep 'name resolution'
time: 1.139; rss: 136MB name resolution
That's a lot of time spent processing unused imports!
The slowdown here isn't quite the same as the style crate, unfortunately. The style crate only has ~7k unused imports, which apparently takes it 48s in that profile. In any case the intent here is clear, lots of unused imports seem to slow down compilation, and the stylecrate looks to have a lot of unused imports!
cc @rust-lang/compiler
We could address this quickly by just emitting the first 10 unused imports warnings and then warning that there are more unused imports not shown.
I can never deal with more than 10 lints at a time anyway myself, so seems like a reasonable solution to me!
Dumping out a ton of warnings is useful when you're hooked into an IDE and you can see them as e.g. green squiggles under the offending text as opposed to 10k lines on a terminal.
I think the reason span_to_snippet is slow is because it calls ensure_filemap_source_present, which does this:
https://github.com/rust-lang/rust/blob/master/src/libsyntax/codemap.rs#L564
fn ensure_filemap_source_present(&self, file_map: Rc<FileMap>) -> bool {
let src = self.file_loader.read_file(Path::new(&file_map.name)).ok();
return file_map.add_external_src(src)
}
I don't think we want to read source files multiple times like that.
Thank you for looking into Servo compile times Alex, and great find here! Could you CC me or someone from the Servo team for Servo-related issues? Thanks.
Can do!
In general I've always wanted the lint system to avoid even running in #[allow] blocks.
Currently the lint system runs all lints in parallel on the tree. This is great since it means we only traverse it once, and the cost of the traversal is consolidated. We don't need to reduce that cost itself (which is nice because then we don't have to worry about #[warn] inside #[allow]).
_Most_ lints do not maintain state. The lint framework has the capability to do so, and we use this a couple times in Clippy, but most lints are struct FooPass; impl LateLintPass for FooPass {...}, so they won't notice if check_expr or whatever isn't called in a region of #[allow].
The "allowness" of a lint is tied to where in the tree the lint traversal is, _not_ the span emitted by the lint. It's perfectly possible to #[allow] a block of code and have the lint still be emit there because a different block of code (where it is #[warn]) hit the lint but emitted a span it got from elsewhere. We've had trouble with this before in clipyp iirc.
All of this enables this proposal: If we had a specialized StatelessLateLintPass trait for this, then when we have an #[allow] we can turn off calling lint methods entirely, and when there's a #[warn] turn it on again.
It doesn't even have to be a trait, really, it could just be a defaulted fake-static method (until https://github.com/rust-lang/rfcs/pull/2027 happens) on LateLintPass.
I suspect this will get significant speedups for crates like Style. I _may_ be able to take a crack at this if y'all think this is a good approach.
It occurs to me that a boolean check per lint each tree node might actually end up being more expensive? I'll have look at this code closer. Might be possible to make this fast by having a second array of trait objects which only contains the active lint passes.
It seems like we do a large enough amount of work that the whole "vector of active lints" stuff won't really be necessary.
However, I just realized that this bug is not during the lint pass but during the resolution phase (which defers linting to the actual lint pass). This won't be able to help, though we can fix it by doing the span_to_snippet in the actual lint and then making this optimization.
So in that case I suspect this optimization is unnecessary. In general we've rarely had performance issues with lints in clippy (which has way more lints than rustc), and we've been pretty careful to measure impact there.
Still, if folks think this optimization can help, let me know.
Most helpful comment
Can do!