There are ways to more or less easily obtain code coverage information from rust binaries, some of which are in widespread use (e.g. travis-cargo + gcov + coveralls.io). However, these are either platform specific (gcov/kcov are linux only), or incomplete (coverage for documentation tests is not collected).
It would be better if rustc would be able to instrument rust binaries and tests using LLVM Coverage Instrumentation. This would allow code coverage information to work portably across the supported platforms, as well as produce reliable coverage information.
Ideally, such support would be integrated in an easy to use cargo coverage
subcommand and the Rust Language Server, such that IDEs can use this information to guide the user while writing tests.
I know I'd love to see this work! Do you know what it would entail in terms of what the hoops a prototype would have to jump through? Also, does this work reliably on platforms like Windows? Or is it still best on Linux and "works mostly elsewhere"?
A prototype would need to generate the LLVM IR for the instrumentation, for that it needs to generate a mapping between source ranges and performance counters (I would start with just functions). Then it needs to generate and embed the IR in the object files. It would be a good idea to look how clang then outputs this into a file to use the same format (which is documented), and test that we can read it with llvm-cov in linux and macos (don't know about windows support but since clang is gaining full windows support if its not there it will be there soon).
I think that would be enough for a prototype, from there we can move on to generating instrumentation for conditionals (match/loop/jumps/ifs...) and blocks (how many iterations was a loop body executed). We could go down to expressions, but then it would be wise to offer users a way to control how much instrumentation is generated: functions, branches, and loops (which are branches) is typically enough. We should then support skipped regions (for conditional compilation), expansions (for macros, maybe plugins), and dealing with unreachable code (there is a performance counter that is always zero for that).
The meat of the work is in the source=> performance counter mapping, generating the IR, and generating the conforming output. llvm-cov
works at least on mac and linux, but IIRC clang has options to generate coverage information even for particular gcov versions. The source code for all this is available so taking a look wouldn't hurt.
Also, does this work reliably on platforms like Windows? Or is it still best on Linux and "works mostly elsewhere"?
It works on Mac and Linux, don't know about Windows. Even if it doesn't work everywhere, this is probably the way to make it work in as much platforms as possible anyways. Clang support on windows is pretty good already and it is only getting better.
@sujayakar's done some awesome work to prototype this on OSX at least, discovering:
-C passes=insert-gcov-profiling
-C link-args=-lclang_rt.profile_osx
, and that library xcrun --find clang
in a path like usr/lib/clang/7.3.0/lib/darwin/libclang_rt.profile_osx.a
..gcda
filexcrun --find clang
, there's an llvm-cov
tool, and that can be run over the output file to generate information.That should at least get code coverage working on OSX in some respect! THis is also a pretty strong case that it may not be too hard to actually get this working for _all_ platforms if LLVM's standard tool suite "just works".
@alexcrichton That's gcov coverage, which is completely different from what @gnzlbg meant in this issue.
The correct pass name is instrprof
, implemented in lib/Transform/Instrumentation/InstrProfiling.cpp
. See also LDC LLVM profiling instrumentation.
@sanxiyn is correct.
I just want to add that I don't think we should implement clang-like gcov coverage generation.
LLVM format can be used for many more things (e.g. profile guided optimizations), and LLVM's llvm-cov tool can generate gcov files from it.
afaik, there's a PR open for this: https://github.com/rust-lang/rust/pull/38608
As far as I can tell, #38608 is still gcov coverage and not instrprof.
I believe this is now implemented as -Z profile
and tracked at https://github.com/rust-lang/rust/issues/42524, so closing in favor of that.
This is not implemented yet. LLVM coverage is different from gcov coverage, which is also different from sanitizer coverage. LLVM implements three separate coverage mechanisms.
Would LLVM coverage ( instrprof
) enable usable code coverage of generic heavy code? None of the existing code coverage solutions work well with heavy use of generics.
My project (a gameboy emulator) makes very heavy usage of generics and it'd be wonderful to get code coverage working on integration tests but I'm not really sure how.
I think good code coverage support is the single most important piece of tooling missing in Rust.
When I run cargo test
, and everything is green, I get a good feeling, but this feeling actually means nothing at all. To make cargo test
mean something, it would need to at least tell me how many code-paths of my application have been exercised.
This information does not mean much either (just because a code-path was exercised does not mean that it was exercised for all inputs), but it would mean more than nothing, and it would make writing tests and asserting the value of test way easier.
IMO adding this kind of capability to cargo test
(and yes, it should be part of cargo test, people should always know their code coverage) would be extremely valuable.
There is only another part of tooling infrastructure that would come close to this in value, and that would be an undefined behavior detector for all rust code.
This might sound like a rant, but I just want to raise awareness that this is a very important issue at least for me (and from the other issues being filled, for others as well) because it directly affects the correctness of rust programs (we don't want them to only be memory safe, but also to have correct logic).
Maybe if rustfmt
, clippy, and the RLS
advance enough during the impl period, the Rust tool team could prioritize these two tools next?
If I don't have precise auto-completion information or my source code is not perfectly formatted, well, I can live with that. But if my program has undefined behavior and I am not catching it because I am only testing 40% of my code-paths then... I am screwed. Does this make sense?
This is not implemented yet. LLVM coverage is different from gcov coverage, which is also different from sanitizer coverage. LLVM implements three separate coverage mechanisms.
Hi, can anyone explain that what's the difference between "LLVM coverage" (profile-instr-generate
) and "gcov" (insert-gcov-profiling
)? I don't know the detailed implementations of these tow passes, but will profile-instr-generate
be more accurate than insert-gcov-profiling
.
I found @kennytm's answer to my question. Let me put here in case someone like me gets lost in the overwhelmed references: https://github.com/rust-lang/rust/pull/44673#issuecomment-330435323
I've done a bit of prototyping on this to see what would be involved in using instrprof
for coverage, and it's revealed (to me) that the coverage aspect of this feature is secondary to the "Profile Guided Optimization" feature that instrprof
was originally written for.
So, here's a vague "plan of action" (with progress so far) to enable PGO and LLVM-assisted coverage:
cargo
This is currently implemented in its most basic form as -Zpgo-gen
which adds the -pgo-instr-gen
and instrprof
passes to LLVM to add instrumentation of all functions and basic blocks (loop bodies, if statements etc.).
There may be other places that it's useful to add instrumentation (e.g. adding vtable-downcast counts to enable -pgo-icall-prom
passes - which turn dynamic dispatch into static dispatch in fast paths).
:+1:
With the above work complete, the generated binaries will count (in-memory) each time an instrumented code path is hit. To extract this information, the profile-rt library should walk the various __llvm_pfr_X
sections of the binary/memory and dump the values out to a file (Clang dumps to default.profraw
). The format of this file is (as far as I can see) "documented" in LLVM/Clang's codebase. On the other hand, the current implementation of Rustc's profile-rt library is taken wholesale from LLVM and thus already supports this function.
:+1:
With a default.profraw
, one can create a profdata
file (using llvm-profdata merge
). This file can be passed to the pgo-instr-use
(and other) passes to enable optimisations based on gathered profile information. Again, this is supported in nightly Rustc today (through -Zpgo-use
).
:+1:
LLVM's coverage tooling can use a "Coverage Map" to map between instrumentation counters and blocks of code. The format of the map is well documented at http://llvm.org/docs/CoverageMappingFormat.html and there's code in LLVM (http://llvm.org/doxygen/classllvm_1_1coverage_1_1CoverageMappingWriter.html) for creating the magic format. Note that this may require Rustc manually inserting instrumentation (rather than using -pgo-instr-gen
) since it needs to know which counter corresponds to which code block. The code in clang (https://github.com/llvm-mirror/clang/blob/master/lib/CodeGen/CodeGenPGO.cpp) that does this is probably a good place to start.
Note that the LLVM coverage map format supports various useful features, not well handled in GCOV-style coverage:
❌
Creating a cargo coverage
(or, possibly better, just creating coverage from cargo test
) by adding profiling to the test
cargo-profile and running the resulting profile data through llvm's tooling would be really cool. Especially if the output format is compatible with tools like Coveralls.io and can produce useful graphical outputs for analysis afterwards.
Likely requires stabilising -Zpgo-gen
or at least exposing enough of it that cargo can create a profiled build.
:x:
There are some bits of Rust code for which coverage is non-obvious:
Foo<i32>
in my UTs but use Foo<String>
in my real code, have I really tested my code (in this case, String needs drop
so maybe there's a subtlety there)?llvm-cov
knows how to handle this in the coverage map at the moment, but it might make sense for it to do so to support this for C++ templates too?#derive
s) need care too - the expanded code coverage is potentially uninteresting (if you trust the proc-macro
) or critical (if you don't).* How do we handle non-instantiated generics? If a generic is never instantiated then no code is created for it so there's no counters for it.
I think rust should behave the same as clang here.
And if I remember correctly, then not instantiated templates are reported.
* Similarly, should separate instantiations of the same generic be covered separately? If I test `Foo<i32>` in my UTs but use `Foo<String>` in my real code, have I really tested my code (in this case, String needs `drop` so maybe there's a subtlety there)?
llvm-cov at least has the option to show coverage per instantiation (or merged - per _-show-instantiations_).
So I think it makes sense to provide this as well.
Great write-up @bossmc! I'll take a stab at adding LLVM coverage map generation, unless it's already being worked on elsewhere?
Whilst I'm happy to try and figure out a solution on my own, a little direction from the compiler team would no doubt ensure that I arrive at something that's acceptable far sooner! Would anyone be willing to mentor this?
Cc @michaelwoerister
I don't have time to mentor this but feel free to ask specific questions!
@bossmc Are you sure that -Zpgo-gen
(now stabilised as -C profile-generate
) adds LLVM's -pgo-instr-gen
and -instrprof
passes? My understanding is that it "merely" instruments via gcov, which is the very issue under discussion here? Have I missed something?
-C profile-generate
is exactly the same as Clang's -fprofile-generate.
Thanks @michaelwoerister, that's what I thought—the difference being, AIUI, that -fprofile-generate
instruments the IR whereas -fprofile-instr-generate
(which this issue seeks) effectively instruments the AST (via source mapping).
Maybe debuginfo can help there?
Unfortunately, for the reasons @kennytm gave in the comment to which @mssun linked above, debuginfo is not sufficient.
@eggyal yes, profile-generate
enables two passes:
pgo-instr-gen
which heuristically adds http://llvm.org/docs/LangRef.html#llvm-instrprof-increment-intrinsic (and related intrinsics) to the LLVM-IR basically by adding them to the beginning of codeblocks.instrprof
which lowers the above intrinsics to "real code" I assume these are intrinsics so the implementation of profile counting can be smart (threadsafe in threaded programs, using native types where needed etc.)Together (with the profile-rt library) these allow generation and subsequent use of profiling data during compilation. Unfortunately together these two passes are insufficient for code-coverage usage, since the LLVM is too far away from the AST. Part 4 of my write-up discusses writing the coverage mapping code that maps AST nodes to instrumentation intrinsics - this is what then enables frontend-guided coverage.
I suspect that adding the coverage map will require disabling the pgo-instr-gen
pass and adding the intrinsics manually (this seems to be what clang calls profile-instr-generate
?), because there's rust-specific knowledge that can/should be applied (e.g. generic expansions, semanticly equivalent AST changes etc.) and because there's no way to guess the profiling entries that will be generated by pgo-instr-gen
.
https://github.com/rust-lang/rust/pull/48346/files#diff-2030f049f8df2454e3720ea9403bc14aR446 is where we started adding the pgo-instr-gen
pass.
I am very eager to see this feature work for Rust. I was the person who designed and originally implemented Clang's profile-instr-generate feature (for both PGO and code coverage). I'm currently working on a project that is implemented in Rust and I miss the ability to get good code coverage data. I'm not yet familiar with the internal workings of the Rust compiler, I'd be happy to consult with anyone who is working on it. I can start by confirming that to get the best results, the instrumentation should be added in the Rust frontend by walking the AST, prior to any internal lowering so that the AST corresponds closely to the source constructs. Adding instrumentation in an LLVM pass can work OK for PGO but it's not what you want for code coverage.
Thank you both @bossmc and @bob-wilson — all very helpful! I've spent some time over the past week getting to grips with how instrprof is implemented in Clang, and have begun prototyping some broad structures of a Rust implementation.
I had pretty much come to exactly the conclusion you mention, @bob-wilson: in order to closely align coverage with the source, instrumentation will need to be added before the AST is lowered (or at least, before HIR is lowered to MIR—this seems slightly better to me as macros will then have been expanded). However, in order to maintain backend-independence, this presumably means that new instrumentation intrinsics will need to be introduced into the MIR (and possibly the HIR)?
As an intermediate step before working on that goal, I am currently thinking that I will first instrument the MIR on lowering to LLVM IR—and thereby prove that I have both instrumentation and coverage map generation working at that level first. But perhaps that would be unnecessarily arduous and I would be best to start with the AST/HIR?
Otherwise I do think that I have a reasonable feel right now for my next steps, but am more than happy to work on this with anyone else who'd like to contribute! Also very grateful to those who've offered their input, especially since this is my first foray into the internals of rustc.
I think the instrumentation and coverage map should be generated from the ASTs after macro expansion and before lowering to HIR. Adding the instrumentation is relatively easier than generating the coverage mapping, so I wouldn't bother doing that on MIR first -- you would probably end up throwing most of it away. I do agree with you that it would be best to start with the instrumentation, and adding instrumentation intrinsics to HIR and MIR seems like the right thing to do.
Another thing that I've been pondering is exactly what should be excluded from the coverage map? Clang excludes system headers, but perhaps we should exclude all dependencies from outside the package/workspace being built?
Can anyone confirm the semantics of syntax::source_map::SourceFile::is_imported(&self) -> bool
?
I think (though I suspect @bob-wilson will know better and can feel free to correct me) that the coverage format has the capability to represent "macros" (remote code) sensibly (https://llvm.org/docs/CoverageMappingFormat.html#file-id) so maybe we'd want to apply instrumentation directly to the parse tree?
@bossmc: Aye, expansion regions are perfect for macros—and they can be fully generated from the Span info in the HIR, whereas the parse tree would not have the expanded code to reference (thus requiring subsequent modification of any found expansion regions to complete the referencing).
Ah yes, I'm being slow :) - so long as the expanded code knows it came from a macro then HIR is the correct point indeed.
Based on the RustC guide, I had the impression that HIR has a number of lowering changes that diverge from a 1-1 mapping to the source code. Trying to map back to the exact source locations for the original constructs seems like it may be difficult. The RustC guide also describes macro expansion as happening prior to lowering to HIR. Is that out of date? It would definitely be best to insert the instrumentation and build the coverage mapping at a point where macros have been expanded, but I was hoping that could be done before other lowering transformations.
Based on the RustC guide, I had the impression that HIR has a number of lowering changes that diverge from a 1-1 mapping to the source code. Trying to map back to the exact source locations for the original constructs seems like it may be difficult. The RustC guide also describes macro expansion as happening prior to lowering to HIR. Is that out of date? It would definitely be best to insert the instrumentation and build the coverage mapping at a point where macros have been expanded, but I was hoping that could be done before other lowering transformations.
Perhaps one of the seasoned rustc developers could chip in on this, as I'm not entirely certain—but after digging through the source for a bit it seems to me that MIR is perfect for instrumentation and source-mapping: I suspect that the coverage regions for a given function are exactly the MIR Statement
descendants of its Body
that are StatementKind::Assign
and their source locations can readily be obtained from their Span data (yielding either a code region or an expansion region); skipped regions might even be identifiable by Statement
descendants that are StatementKind::Nop
—although perhaps some such regions will not have any presence whatsoever in the MIR?... which then might only leave gap regions (that I confess I don't yet entirely understand).
which then might only leave gap regions (that I confess I don't yet entirely understand)
I understood it to be for empty statement, e.g. if (condition) {/*gap*/}
?
Edit: see below.
At least that's what I'm gathering from CoverageMapping.h:
/// A GapRegion is like a CodeRegion, but its count is only set as the
/// line execution count when its the only region in the line.
Although you may very well be able to implement some kind of code coverage support at the MIR level, that would be a different approach than what we did in Clang. One key idea from Clang's implementation is that both the instrumentation and the coverage map are directly tied to the source constructs. As one example, take an "if" statement. The instrumentation is done by inserting a counter at the start of the "then" block. If there is an "else" block, there's no need to put a counter there because you can calculate the execution count from the parent count and the "then" block count. This also holds true in the coverage map, where the goal is to precisely map the execution counts back to source locations. So why should this be done before any lowering? As an example, take one thing that I saw in the RustC Guide: "if let" is lowered to HIR as a "match". A match statement can have any number of arms, so it generally needs to be instrumented differently than an "if" statement. When lowering from an "if let", presumably there is only one arm (or two, if there was an "else") but even if there is a way to recognize that, it would need to be a special case and I suspect it would get quite complicated. Could you just go ahead and instrument it as a "match"? Sure, but then you're likely to run into challenges mapping back to the exact source locations. I can give an example of a gap region that is somewhat relevant here. Say that you put a blank line in between then "if let" condition and the start of the "then" block. What execution count should be displayed for that blank line? Clang treats this as a gap region from the end of the condition to the start of the "then", and it chooses to use the "then" block count for that region. I haven't given it a lot of thought, but I think the answer may be different for a "match" statement. It would be strange for that blank line to show the execution count of the first arm of the match.
That's really helpful, thanks @bob-wilson.
I'm definitely not averse to injecting the instrumentation earlier (it was what I had earlier thought would be needed), but I'm still not completely convinced that MIR lacks the necessary information to produce the desired result. Taking the example you mention, of if
being lowered to a match
, the generated MIR TerminatorKind::SwitchInt
will have the relevant number of .values
/ .targets
—and a counter expression can be used in place of an outright counter if appropriate, irrespective of whether the original source was an if
or a match
. So far from being a special case, it actually enables match
expressions with fewer arms to be handled with counter expressions in exactly the same way as an if
. Nor do I think there's any issue mapping back to source locations, as the containing Terminator
structure has .source_info
that contains the generating Span.
Part of my comment with that example is that I suspect you might want different behavior for the gap region after the "if let" condition than for the gap region after a match condition with 1 arm, and you may not be able to distinguish them easily after lowering.
The other missing part of this discussion relates to using these profiles for PGO in the context where someone wants to save a profile as part of their project (e.g., in a case where collecting a good profile is labor intensive). We had a design goal that the profile should remain usable in that scenario even if you had updated your compiler version so that the internal lowering changed. We did that by tying the profile structure to the source code structure as reflected in the pre-lowering ASTs. I don't know if Rust will ever support that style of instrumentation-based PGO, but I think it is still a valuable feature for some styles of development and it would make sense to me to implement the coverage profiling in a way that would align with that in case someone wants to support it in the future.
More generally, I think you can probably follow Clang's lead in many aspects of this work. It's an approach that has been tried and works well in practice. If there's a compelling reason to do it differently in Rust, then it might make sense, but otherwise, my advice is to follow Clang's approach.
Is there even a need to leave out the counter on the else
block? Isn't that one of the reasons that Clangs coverage does not work reliably when exceptions are thrown?
@pJunger: It's not "leaving out" the counter, but rather providing means for deducing its value with zero runtime overhead.
@bob-wilson: I'm not sure I'm following the thinking re gap region differences following if let
and match
—would you be able to elaborate with an example? You're no doubt right about the PGO side of things (especially when rustc's internal structures are still quite fluid) and the principle of following Clang generally... I'll give it some more thought! My only inkling for starting from MIR is that it has essentially done a lot of the grunt already, and starting from AST would be duplicating much of that work.
Leaving out the "else" counter reduces the runtime overhead of the instrumentation and produces smaller profile files. There might be some 2nd order effects related to throwing exceptions, but the primary trouble with exceptions is that we thought it would be too expensive to insert a counter after every function call. The issue here is that throwing an exception can essentially cause a difference between the execution count going into a function call and the count coming out of it. The brute force approach to detect that would be to put a counter after every call. You might be able to do something clever with automatically generated exception handlers, but I'm not aware of anyone who has looked into that.
@eggyal : Consider "if let condition" followed by a blank line vs. "match condition" followed by a blank line. What execution count should be displayed for that blank line? For the "if", Clang shows the "then" count. The corresponding behavior for the "match" (or a match lowered from "if let") would be to show the count for the first arm of the match. That just seems wrong to me.
Okay, I understand... although wouldn't this be more a coverage tool issue rather than an instrumentation issue? Indeed, shouldn't the coverage tool show the execution count of any lines between if/match arms as being either skipped or that of the containing if { }
/ match { }
region itself—i.e. without any need for explicit gap regions?
This is definitely outside my area of expertise, but some observations:
if let
vs match
distinction is captured in the MatchSource field of each hir::ExprKind::Match
value.Yes, it's a coverage tool issue. There are a lot of cases like this where there is no clear right answer and the best compromises require understanding the semantics of the code. (I.E., you want the compiler to figure it out instead of relying on some pattern matching in an IDE editor.) The instrumentation part is relatively easier than the coverage mapping, and you definitely want the instrumentation and coverage mapping to be generated from the same ASTs, so be sure to think about the coverage tool issues when deciding where to insert instrumentation.
Regarding the gap regions, I'm not the expert on that part, but I remember that there was quite a lot of iteration trying to come up with solutions that most people were happy with. It's really not easy. For example, for blank lines between match arms, you don't want them to show up as being unexecuted, but using the count from the containing region can give very surprising results, e.g., if none of the match arms ever execute but the blank lines between them have large non-zero counts from the enclosing region. Most people seem to expect that those blank lines will be included as part of the code region right before them, but then you run into problems with gaps after control flow.
@rkruppe Clang does not generate coverage instrumentation from LLVM IR. That's way too low level to do a good job. For gcov-style instrumentation, we used to rely on debug info to get back to source locations, and that really doesn't work well in practice. Debug info tends to be optimized to work well for single stepping in a debugger, and it's not so good for precisely identifying code coverage regions.
@bob-wilson Obviously this is better for runtime performance & data sizes.
I'm just asking because in the automotive industry every decision & condition have to be covered (including those that throw/panic), so taking over the design from Clang 1:1 would make this instrumentation useless for that. (And this is a bit sad, because otherwise you could interpolate CD/Coverage from those counters and compete with Bullseye Coverage and the likes).
@bob-wilson
@rkruppe Clang does not generate coverage instrumentation from LLVM IR. That's way too low level to do a good job. For gcov-style instrumentation, we used to rely on debug info to get back to source locations, and that really doesn't work well in practice. Debug info tends to be optimized to work well for single stepping in a debugger, and it's not so good for precisely identifying code coverage regions.
Yes, I draw the analogy between generating instrumentation from MIR and generating it from LLVM IR to suggest that generating it from MIR won't be any better.
Regarding the gap regions, I'm not the expert on that part, but I remember that there was quite a lot of iteration trying to come up with solutions that most people were happy with. It's really not easy. For example, for blank lines between match arms, you don't want them to show up as being unexecuted, but using the count from the containing region can give very surprising results, e.g., if none of the match arms ever execute but the blank lines between them have large non-zero counts from the enclosing region. Most people seem to expect that those blank lines will be included as part of the code region right before them, but then you run into problems with gaps after control flow.
Not that I propose addressing it, but this does rather sound like a UI issue, since the execution counts relate to regions rather than lines but (presumably for historical reasons) the tools only display line counts... these problems seem to stem from that disparity? Anyway, I almost have both instrumentation and mapping generation working from MIR and hope from that to glean a much better understanding of these problems! Will then work on moving to the AST instead.
@bob-wilson: I'm delighted to say that I now recognise the folly of trying to generate these mappings from the MIR: it's not practical to comprehend program flow through the BasicBlocks well enough to establish the code regions. Nevertheless I'm pleased that I undertook the exercise to understand the problem better—thank you for patiently indulging me whilst I caught up with you!
By the way, maybe this goes without saying, but I definitely recommend following the Clang implementation as closely as makes sense. For the instrumentation code, see lib/CodeGen/CodeGenPGO.{h,cpp}. The coverage mapping code is in lib/CodeGen/CoverageMappingGen.{h,cpp}
I guess one difference will/should be that return
and ?
needs to be handled?
Some news about this? 💔
@erikjara I've made some good progress but then some other projects demanded quite a bit of focus and I haven't looked at this for a good month or so—hoping to get back on it after Christmas.
(For others who hadn't noticed this, the first part of this has now been merged in #73011!)
So I just saw the help text for the flag:
-Z instrument-coverage=val -- instrument the generated code to support LLVM source-based code coverage reports (note, the compiler build config must include
profiler = true
, and is mutually exclusive with-C profile-generate
/-C profile-use
); implies-C link-dead-code
(unless explicitly disabled)and
-Z symbol-mangling-version=v0`; and disables/overrides some optimization options (default: no)
And there is an open issue that can make -C link-dead-code
problematic if you have one a crate doing C bindings in your build tree, which would definitely impact the usability/accuracy of the results https://github.com/rust-lang/rust/issues/64685
there is an open issue that can make
-C link-dead-code
problematic
@xd009642 Thank you!! ❤️
You provided a key clue to resolving an issue running Rust binaries instrumented with -Zinstrument-coverage
on MSVC!
Linking never failed (or at least, it didn't generate error messages), but when running the program, the program would usually SegFault during the post-exit phase to save the profiling counters to the *.profraw
file. (If it didn't crash, it would produce an empty file.)
I inspected the generated instructions in the binary in "IDA Pro" and in the Visual Studio debugger, and could see some major problems with missing or misaligned symbols/addresses, resulting in some instructions being optimized out, which explained the SegFaults, but I was not able to understand why it was linked wrong.
I disabled the dead code option with -Clink-dead-code=no
and everything worked!
I haven't had an issue with link-dead-code
on Linux or Mac, or even on Windows under GNU, but it was consistently failing under MSVC.
I'll change the option to enable link-dead-code
ONLY if not building for MSVC, and hopefully that will be good enough.
I've collected some debugging notes, which you can see here. I'll also link them in the bug report, but I'll update them with this information as well.
FYI: @tmandry @wesleywiser @petrhosek
Looks like I have the fix for --Zinstrument-coverage
under MSVC! 🎉
@richkadel profiling does not work with windows-gnu when building profiler library with GCC (but it does with Clang) and is not tested on the CI. With GCC it segfaults like that https://reviews.llvm.org/D86405#2232228
I'll open PR to Rust side of the issue today, LLVM side is waiting on https://reviews.llvm.org/D86405.
@mati865 Thanks for the additional info. When you say "profiling" here, you're referring to PGO, right?
Just to be clear, my comment was not about Rust's PGO, but was about the Clang InstrProf coverage profiling feature, enabled by rustc -Zinstrument-coverage
. I believe the InstrProf profiling depends on Clang and (I assume) would not work with GCC.
(I should say, InstrProf depends on LLVM, that is.)
@richkadel I'm talking about llvm-project/compiler-rt/lib/profile
which is used for both PGO and code coverage.
GCC somehow miscompiles that library so we cannot test it on CI. When that library is built by Clang (and used by rustc) code coverage doesn't cause segfaults in the resulting binaries.
Ok, yes. (The notes I linked above are pretty extensive, but include disassembled views of the results from both Clang and rustc.) At any rate, rustc appears to be working with that library now, as long as I disable link-dead-code
. (Granted, I've only tested a few small binaries, but given all of the things I've tried (with no significant difference or improvement), I'm optimistic that the link-dead-code
was the reason. Whether this helps PGO or not, I cannot say.
@richkadel glad my comment was helpful :+1:
So the approach I'm taking with tarpaulin is to use syn to find coverable lines so I don't need to link dead code and can get the same results. I've been partially taking this approach for code which can be missed out regardless of dead code. But a combination of linker issues with -C link-dead-code
and users having issues with compile times and target directory sizes makes me feel this is the best solution (at least for tarpaulin).
Just to elaborate on "code which can be missed out regardless of dead code". If there's a generic function that's unused no code is generated so traditional coverage tools will miss it out without using source analysis to catch the code.
I am really looking forward to this llvm coverage stuff making it into stable rust, because then I can use this as a coverage collection backend and keep the source-analysis and reporting code on either end and run it without dead-code. :grin:
I'm sorry if this is clearly mentioned in the llvm coverage notes (if so I missed it) but does llvm coverage account for condition coverage? So instead of just branch - evaluating the boolean condition for a branch, condition coverage looks at the boolean subconditions, so for if a && b
it will report on a being true/false and b being true/false/X (X for short-circuiting)
Yes, Boolean short circuiting will definitely be "covered" (no pun intended) 😄
so just to be extra explicit if I have:
if a || b {
println!("foo");
} else {
println!("bar");
}
And test with only a toggling between true and false and everything else false, that would be 50% condition coverage, but 100% branch coverage (and also 100% line coverage).
Ok, yes. (The notes I linked above are pretty extensive, but include disassembled views of the results from both Clang and rustc.) At any rate, rustc appears to be working with that library now, as long as I disable
link-dead-code
. (Granted, I've only tested a few small binaries, but given all of the things I've tried (with no significant difference or improvement), I'm optimistic that thelink-dead-code
was the reason. Whether this helps PGO or not, I cannot say.
Doesn't this mean that you're missing one or more references that cause linker to drop symbols that are being used? Note that profile instrumentation is current incompatible with --gc-sections
which is this issue wouldn't manifest on Linux even if was present there.
@xd009642 - Some of the specifics are part of what I'm finally nailing down now, as the final phase of the implementation.
I expect to produce results that resemble the coverage spans produced by Clang, for cases where the syntax is similar (like an if/else, as in your example).
But (at least in my opinion) accurate coverage of the executable code regions is what's most important. Deciding what surrounding syntax elements should also be included is partly a question of style and readability.
It's easier for me to address your specific example in terms of the code regions and their coverage counts, which is how LLVM InstrProf encodes the coverage, than it is percentages and line counts, which are computed by LLVM tools after the fact. So hopefully you can extrapolate from this similar example:
if a == b || b == c {
println!("foo");
} else {
println!("bar");
}
Assuming a
and b
are the same value, then it doesn't matter what c
is, so LLVM's coverage tools would report:
println!("foo")
was executed 1 time. (Like with the ||
, the code region may also include the else
keyword, but I'll check out Clang's answer to this and probably replicate its approach.)println!("bar")
was executed 0 timesNote, if this snippet represented the entire function, there's an implicit return ()
here that might need to be accounted for, but without any actual source code to construct a "code region" for that, it may not show up in the llvm-cov show
reports. Nevertheless, I think it may be important to report this since an early return
(say, if there was one right after println!("foo")
) would bypass the implicit return, and someone might want to know that.
@petrhosek - That's really interesting. Now that you mentioned it, I now finally see that a rust, linker-dependent function called gc_sections()
is called/enabled based on the value of -Clink-dead-code
(but note, it's only enabled if link-dead-code is turned off), and see my follow-up comment):
There are only about 3 or 4 places that the link-dead-code option affects rustc
. I'm not sure what would happen if I just enabled or disabled the gc-sections()
options, without changing the other link-dead-code dependencies.
I'm not sure I understand what you're implying about --gc-sections
compatibility and Linux. The fact is, the -Z instrument-coverage
implementation does work on Linux and Mac with -C link-dead-code
, _and_ I've seen the coverage results on Linux with link-dead-code both enabled and disabled. Both worked and the produced different coverage results consistent with expectations (showing zero coverage in one example, for dead code, and showing no coverage results for the dead code when link-dead-code is disabled).
I should also add that the gc_sections()
function in the referenced link.rs
snippet is does not directly translate to the --gc-sections
flag. It depends on which linker is used, and on the value of keep_metadata
. Here's what rustc does for the MSVC linker:
(FYI, I had to edit the comment before last. I made a couple of incorrect statements originally.)
@richkadel awesome that makes sense and I can see how to derive other coverage metrics from the info :+1:.
Regarding dead code not causing issues on linux maybe try with https://github.com/rust-ndarray/ndarray-linalg I've found I can get issues with crates that using FFI libraries in their dependency tree. Although, there's always a chance this is some interaction of link-dead-code and the relocation-model I need to use in order to instrument the binaries with breakpoints (an issue not present with LLVM instrumentation).
I wonder if a possible solution is to apply link-dead-code only to the crate libraries/binaries and not the dependencies. That would at least isolate the problem if (like my issue) it only applies to a subset of crates - but if one of them is in the dependency tree it impacts the whole project. EDIT: cargo rustc
seems to be the solution to experiment with it
FYI: As of PR #78267, LLVM InstrProf-based code coverage is "feature complete"! (I believe this last PR will be in the Rust nightly
distribution channel tomorrow, Nov 7.) It has been tested on a few large crate implementations, but will need more real world testing. Please cc me on any bug reports.
Usage is now documented in The Rust Unstable Book.
@richkadel which version of llvm-profdata does it require to work? I installed the one in my distros package manager and all it gives is an unhelpful message saying it's an incompatible version. When I'm on my machine later I can get the versions and error message
Rust builds with LLVM 11 currently, so use that if you can. 10 may or may
not also work.
On Fri, Nov 6, 2020 at 8:45 AM xd009642 notifications@github.com wrote:
@richkadel https://github.com/richkadel which version of llvm-profdata
does it require to work? I installed the one in my distros package manager
and all it gives is an unhelpful message saying it's an incompatible
version. When I'm on my machine later I can get the versions and error
message—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/rust-lang/rust/issues/34701#issuecomment-723181469,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AARMYYBLKDBFGZHOVXS75VLSOQR4BANCNFSM4CI4DJVQ
.
Now llvm-cov
should be shipped through llvm-tools-preview?
Yes! Thanks for the PR @dalance !
Most helpful comment
I think good code coverage support is the single most important piece of tooling missing in Rust.
When I run
cargo test
, and everything is green, I get a good feeling, but this feeling actually means nothing at all. To makecargo test
mean something, it would need to at least tell me how many code-paths of my application have been exercised.This information does not mean much either (just because a code-path was exercised does not mean that it was exercised for all inputs), but it would mean more than nothing, and it would make writing tests and asserting the value of test way easier.
IMO adding this kind of capability to
cargo test
(and yes, it should be part of cargo test, people should always know their code coverage) would be extremely valuable.There is only another part of tooling infrastructure that would come close to this in value, and that would be an undefined behavior detector for all rust code.
This might sound like a rant, but I just want to raise awareness that this is a very important issue at least for me (and from the other issues being filled, for others as well) because it directly affects the correctness of rust programs (we don't want them to only be memory safe, but also to have correct logic).
Maybe if
rustfmt
, clippy, and theRLS
advance enough during the impl period, the Rust tool team could prioritize these two tools next?If I don't have precise auto-completion information or my source code is not perfectly formatted, well, I can live with that. But if my program has undefined behavior and I am not catching it because I am only testing 40% of my code-paths then... I am screwed. Does this make sense?