Julia: Coverage misses lines

Created on 19 Jul 2018  Â·  40Comments  Â·  Source: JuliaLang/julia

For this file: https://github.com/invenia/Memento.jl/blob/master/src/stdlib.jl

The coverage report is:

        - if VERSION > v"0.7.0-DEV.2980"
        -     import Base.CoreLogging:
        -         AbstractLogger,
        -         handle_message,
        -         min_enabled_level,
        -         shouldlog,
        -         global_logger,
        -         Debug
        - 
        -     const LEVEL_MAP = Dict{AbstractString}
        -     struct CoreLogger <: AbstractLogger
        -     end
        - 
        -     min_enabled_level(logger::CoreLogger) = Debug
        4     shouldlog(logger::CoreLogger, arags...) = true
        - 
        -     function handle_message(::CoreLogger, cl_level, msg, mod, group, id, filepath, line; kwargs...)
        2         logger = getlogger(mod)
        -         level = lowercase(string(cl_level))
        2         log(logger, logger.record(logger.name, level, getlevels(logger)[level], msg))
        -     end
        - 
        -     function substitute!()
        1         global_logger(CoreLogger())
        -         notice(getlogger(@__MODULE__), "Substituting global logging with Memento")
        -     end
        - else
        -     function substitute!()
        -         warn(
        -             getlogger(@__MODULE__),
        -             "Global logging substitution is not support for julia $VERSION"
        -         )
        -     end
        - end
        - 

Notice that level = lowercase(string(cl_level)) is between two covered lines but somehow isn't reported as covered.

bug

Most helpful comment

@sbromberger Let me explain a bit longer as to why I think that what I wrote is 100% relevant to the problem. I need to elaborate a bit for this, so please bear with me...

When Julia is aksed to generate coverage data, it outputs coverage files for every source file (see the description of this issue for an example). In there, it records (slightly simplified) for every line whether it contains code or not; if it contains code, it also records how often that line was executed, as a non-negative integer. The number 0 indicates a line is code, but was not executed (= dead code).

Now, what happens in practice is that Julia reports a lot of lines which you and me would clearly consider as "this is code" instead as "this is not code at all". The result is that these lines do not contribute for coverage. Now, this essentially happens in two cases:

  1. Dead functions: if a function is never called (and hence Julia never JITs it / never generates LLVM bitcode for it), then it is recorded as "not code"
  2. If code is inlined, it often (always?) is marked as "not code".

These two cases have ultimately (I think) the same origin (but really, it doesn't matter, so let's not squabble about it), but have very different effects:

  • case 1 is not affected by inlining being on or off at all
  • case 2 is affected by inlining, which is why using --inline=no usually is a good workaround (@maxbennedich it would be helpful to see an example where it *did not help; care to elaborate?)
  • case 1 is highly undesirable, because it makes it hard to find dead functions. Indeed, you could have 100 lines of active code and 1 million lines of dead code, and still get 100% coverage reported
  • case 2 is often a bit confusing, but usually far less problematic when it comes to coverage reporting; you usually can still tell if some segment of code is being executed or not.

Now, what happened until recently is that Coverage.jl for Julia 0.7 and 1.0 simply passed on the reported coverage data to Codecov. This resulted in reports like this one where you see that line 27 clearly contains code, but the report says it is not code. While in this newer report, line 27 is marked as "code that was not executed".

But for Julia 0.6, Coverage.jl had (has) a function amend_coverage_from_src! which, as its name suggests, tries to "improve" the coverage data, in an attempt to fix case 1. What recently changed is that I fixed this function to also work in Julia 0.7 / 1.0. And this in turn, combined with the inlining problems reported in this issue here, causes the undesirable Coverage drops you and others are experiencing. Relevant is specifically my merged PR https://github.com/JuliaCI/Coverage.jl/pull/179, which fixed various compatibility issues with Julia 0.7 resp. 1.0.

What does amend_coverage_from_src! do? It tries to amend the coverage report, by looking for lines which are code, but which are not marked as code. Unfortunately, this function is too aggressive, as it marks all such lines as dead code -- even those which are due to case 2 above (or, depending on whom one wants to blame, Julia is not reporting inlined code lines correctly, which e.g. gcov for C/C++ code manags -- that's what this issue here on Julia repository is about).

Hence my proposed fix: teach amend_coverage_from_src! to be less aggressiv and distinguish between case 1 and case 2 above via a heuristic: if a function contains at least one line that is marked as code, then don't "amend" its coverage data. I am pretty sure this would fix all your coverage problems -- though of course ultimately one has to try it to be sure. This is what
https://github.com/JuliaCI/Coverage.jl/pull/188 starts, but as I wrote above, it is not quite complete yet, as now it again amends too little code; but as described, this could be greatly improved with some effort.

All 40 comments

Now is perhaps a good time to update LineInfoNode to track line-number ranges in the parser (but might be hard still for the optimizer to handle updating this information). And see also https://github.com/JuliaLang/julia/pull/11802/files

Perhaps we should inject this information before optimization (like we are doing at the llvm level), as this'll problem will only get worse as our optimizer gets better. We could more-easily do it then at the basic-block level. This would be more likely to require recompiling the system image to switch the flag though.

We are seeing the same thing starting in Julia 0.7. Every Julia 0.7 and 1.0 project with test coverage I've seen suffer from the same thing. I've tried with --inline=no but it's still incorrect for us.

Any updates on this issue?
I haven't been able to use Coveralls or Codecov.io since v0.7. This was actually very useful in developing tests for my packages. Without that I have no way to find which lines of codes are actually covered. Is there a way to do that and I'm missing it out?

I don't know whether this is related, but LightGraphs just dropped to from 99.8% to 66% codecov (and the reported coverage gaps don't make sense).

I have the same problem in one of my packages. It simply misses lines, as in the OP.
Any news on this?

To clarify, this issue is certainly amplified by recent changes to https://github.com/JuliaCI/Coverage.jl These are my "fault", although I "only" enabled code that was used on Julia 0.6 to also work correctly on Julia 0.7 and 1.0, which made the issue much more visible. Some discussions on mitigating the problem:

Hey folks, this is really getting frustrating. LightGraphs dropped from 100% to 68% code coverage several months ago, and I can't use the --inline=no trick because 1) it results in an inference error (that doesn't occur with inlining), and 2) travis times out.

Right now all our PRs are failing because we require codecov to be > 97%. I'd rather not change that, because it's an important metric. Also, it makes the project look bad to show a red "68%" coverage badge, especially when we've spent all this time getting coverage up to 100%.

What workarounds are available right now?

(Disclaimer: This doesn’t actually answer the core issue.)

On your --inline=no PR (https://github.com/JuliaGraphs/LightGraphs.jl/pull/1072), Travis passes on
Julia 0.7 but fails on Julia nightly. Perhaps as a short term fix you could
edit your .travis.yml to allow failures on Julia nightly?

On Sun, Dec 2, 2018 at 18:58 Seth Bromberger notifications@github.com
wrote:

Hey folks, this is really getting frustrating. LightGraphs dropped from
100% to 68% code coverage, and I can't use the --inline=no trick because
1) it results in an inference error (that doesn't occur with inlining), and
2) travis times out.

Right now all our PRs are failing because we require codecov to be > 97%.
I'd rather not change that, because it's an important metric. Also, it
makes the project look bad to show a red "68%" coverage badge.

What workarounds are available right now?

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/JuliaLang/julia/issues/28192#issuecomment-443554574,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFXArdY5knKt5DUZvQ_yPnOkmvmJ3fuqks5u1GkRgaJpZM4VW4vt
.

@DilumAluthge thanks. That might address one concern, but all PRs will still fail because we have a policy of only allowing PRs that have an aggregate codecov of 99+% and that do not drop the overall codecov below 97%.

As an update, setting --inline=no and ignoring nightly failures reduces reported coverage:
https://github.com/JuliaGraphs/LightGraphs.jl/pull/1073

I don't know how to get proper reporting at this point.

Maybe you needs something like https://github.com/auchenberg/volkswagen for the interim ;-)

So one option would be to revert some of the Coverage.jl changes -- then dead / unused functions would not be reported as code, and you'd get your coverage up again (but at the cost of loosing the ability to find unused functions via coverage reports -- which to me is a major loss, but of course I realized this may be quite different for other people).

Another potential solution is described on https://github.com/JuliaCI/Coverage.jl/pull/188 -- that PR is incomplete, though, but I do try to sketch how one could complete it. The core idea there is that a function which has zero lines with coverage likely is unused, so we just mark all its lines as unexecuted code. So far, Coverage.jl does that for every line of code in every function; but now, we would not do so anymore: Rather, any function which has at least one line of code that was executed at least once, clearly is not unused, so we don't mark any code lines it as unexecuted (even though it may contain lots of those, due to the inlining issue discussed here). The missing part in the PR is that one has to do this recursively, to treat functions which are nested in other functions or in modules correctly. That shouldn't be too hard, I reckon (famous last words), but I currently have other far more pressing matters to take care of; so if somebody wants to take a stab at completing that PR in the meantime, please don't wait for me (I'll try to be available for discussion or reviews, if desired, though).

then dead / unused functions would not be reported as code, and you'd get your coverage up again

This isn't the problem, I don't think. We have 100% coverage of all functions in the tests. The problem is that something changed and the coverage is not being reported. Here's an example.

How is it that lines 26 and 28 both have coverage (3 separate times), but line 27 - which is not in the middle of any branch - has no coverage? I can tell you that this function is being tested here.

This has happened throughout the codebase and it all started a few months ago.

The core idea there is that a function with has zero lines with coverage likely is unused

Again, we don't have any of these to my knowledge. All functions are tested. We had a huge effort about a year and a half ago to ensure that all the code receives test coverage.

Someone mentioned that kwargs makes it difficult to troubleshoot this, so here's an example that may be more straightforward.

Coverage, and Test.

We never found a workaround. --inline=no didn't really make any difference. We gave up in the end and stopped tracking test coverage :(

@sbromberger Let me explain a bit longer as to why I think that what I wrote is 100% relevant to the problem. I need to elaborate a bit for this, so please bear with me...

When Julia is aksed to generate coverage data, it outputs coverage files for every source file (see the description of this issue for an example). In there, it records (slightly simplified) for every line whether it contains code or not; if it contains code, it also records how often that line was executed, as a non-negative integer. The number 0 indicates a line is code, but was not executed (= dead code).

Now, what happens in practice is that Julia reports a lot of lines which you and me would clearly consider as "this is code" instead as "this is not code at all". The result is that these lines do not contribute for coverage. Now, this essentially happens in two cases:

  1. Dead functions: if a function is never called (and hence Julia never JITs it / never generates LLVM bitcode for it), then it is recorded as "not code"
  2. If code is inlined, it often (always?) is marked as "not code".

These two cases have ultimately (I think) the same origin (but really, it doesn't matter, so let's not squabble about it), but have very different effects:

  • case 1 is not affected by inlining being on or off at all
  • case 2 is affected by inlining, which is why using --inline=no usually is a good workaround (@maxbennedich it would be helpful to see an example where it *did not help; care to elaborate?)
  • case 1 is highly undesirable, because it makes it hard to find dead functions. Indeed, you could have 100 lines of active code and 1 million lines of dead code, and still get 100% coverage reported
  • case 2 is often a bit confusing, but usually far less problematic when it comes to coverage reporting; you usually can still tell if some segment of code is being executed or not.

Now, what happened until recently is that Coverage.jl for Julia 0.7 and 1.0 simply passed on the reported coverage data to Codecov. This resulted in reports like this one where you see that line 27 clearly contains code, but the report says it is not code. While in this newer report, line 27 is marked as "code that was not executed".

But for Julia 0.6, Coverage.jl had (has) a function amend_coverage_from_src! which, as its name suggests, tries to "improve" the coverage data, in an attempt to fix case 1. What recently changed is that I fixed this function to also work in Julia 0.7 / 1.0. And this in turn, combined with the inlining problems reported in this issue here, causes the undesirable Coverage drops you and others are experiencing. Relevant is specifically my merged PR https://github.com/JuliaCI/Coverage.jl/pull/179, which fixed various compatibility issues with Julia 0.7 resp. 1.0.

What does amend_coverage_from_src! do? It tries to amend the coverage report, by looking for lines which are code, but which are not marked as code. Unfortunately, this function is too aggressive, as it marks all such lines as dead code -- even those which are due to case 2 above (or, depending on whom one wants to blame, Julia is not reporting inlined code lines correctly, which e.g. gcov for C/C++ code manags -- that's what this issue here on Julia repository is about).

Hence my proposed fix: teach amend_coverage_from_src! to be less aggressiv and distinguish between case 1 and case 2 above via a heuristic: if a function contains at least one line that is marked as code, then don't "amend" its coverage data. I am pretty sure this would fix all your coverage problems -- though of course ultimately one has to try it to be sure. This is what
https://github.com/JuliaCI/Coverage.jl/pull/188 starts, but as I wrote above, it is not quite complete yet, as now it again amends too little code; but as described, this could be greatly improved with some effort.

Thanks, @fingolfin - that was really helpful.

If it helps, --inline=no reduced our code coverage as well in LightGraphs. (See https://github.com/JuliaGraphs/LightGraphs.jl/pull/1073 for the report). Maybe that will give you additional insight?

Also, I'm confused:

But for Julia 0.6, Coverage.jl had (has) a function amend_coverage_from_src! which, as its name suggests, tries to "improve" the coverage data, in an attempt to fix case 1. What recently changed is that I fixed this function to also work in Julia 0.7 / 1.0.

But we had ~100% code coverage in 0.6 as well, so whatever happened wasn't a 1:1 reimplementation for 0.7/1.0.

In Julia 0.6, this inlining issue did not exist (AFAIU)

case 2 is affected by inlining, which is why using --inline=no usually is a good workaround (@maxbennedich it would be helpful to see an example where it *did not help; care to elaborate?)

Hmm, in our experiment it didn't make any difference whatsoever. Did we miss something? Here's the commit.

And here's the resulting coverage report.

"COVERAGE REMAINED THE SAME AT 87.328%"

Hmm, in our experiment it didn't make any difference whatsoever. Did we miss something?

--inline is not transferred to the process that Pkg.test starts in Julia 1.0.2.

In Julia 1.0.2, what is the recommmended way to run tests with inlining
turned off?

On Mon, Dec 3, 2018 at 12:19 Fredrik Ekre notifications@github.com wrote:

Hmm, in our experiment it didn't make any difference whatsoever. Did we
miss something?

--inline is not transferred to the process that Pkg.test starts in Julia
1.0.2.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/JuliaLang/julia/issues/28192#issuecomment-443791687,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFXArfsmLmKdphYTISLzNYCmXk6cTFBWks5u1V0dgaJpZM4VW4vt
.

julia --inline=no --project --code-coverage=user test/runtests.jl

--inline is not transferred to the process that Pkg.test starts in Julia 1.0.2.

Ah, that would explain things :) That was Julia 0.7, but I guess same behavior there.

~Please don't confuse the issue with inlining with the issue reported here. The results are identical with inlining disabled.~

EDIT: nvm, too many coverages files and I forgot that when running interactively the coverage files aren't saved until exit

I think https://github.com/JuliaLang/julia/pull/30251 should much improve case 2 for --inline=yes|default.

Tried out #30251 with LLVM.jl: coverage on 1.0.2 is 70% down from 97% before the coveragepocalyps, on 6594acc254 it's back to 86% (note sure why clicking through to the files is broken, that's recent).

@maleadt Thanks, that's useful. I picked a file "at random" to get an idea how often case 1 (code is inlined and incorrectly marked as unexecuted code) vs case 2 (actual dead code) is. Specifically, I chose src/core/instructions.jl, which dropped from 93% coverage to still just 75% coverage with #30251.

Looking at it, we see that for example lines 43+44, which contain

predicate_int(inst::Instruction) = API.LLVMGetICmpPredicate(ref(inst))
predicate_real(inst::Instruction) = API.LLVMGetFCmpPredicate(ref(inst))

are marked as "not code" in the former, but are marked as "dead code" in the latter. I can't tell which one is correct, though, but I assume you can tell us?

If it is actual dead code, then this is case 2 in my description above, and so in a sense an improvement. If it is code that is not dead, then it means these functions were simply completely inlined. Unfortunately, this then is a situation which the heuristic I described as potential workaround above (and also see https://github.com/JuliaCI/Coverage.jl/pull/188) won't be able to deal with. And then I don't see anything that Coverage.jl could do to disambiguate the two cases. So I guess if this is happening, then the only fix for it is to stop using amend_coverage_from_src! in Coverage.jl, and live with being unable to detect dead functions in coverage reports :/. The only way to change I can think of right now then would be for Julia to generate "better" coverage reports which somehow allow disambiguating dead functions from completely inlined one.

According to https://github.com/maleadt/LLVM.jl/search?q=predicate_int&unscoped_q=predicate_int there is indeed no place calling predicate_int, so I still have some hopes, but I'd rather have @maleadt confirm this.

Thanks for the analysis, that is dead (well, untested) code indeed so definitely an improvement.
Looking through some other files, coverage statistics seem very reasonable now, and much more accurate wrt. dead code.

I found that amend_coverage missed a function (https://codecov.io/gh/maleadt/LLVM.jl/src/752aae1a6f86d17877633c6cea70a529a1ad05a4/src/passmanager.jl#L5), but that everything reported seemed correct.

Awesome, thanks @vtjnash!

Thanks @vtjnash ! Hmm, I wonder a bit about function define_transforms in https://codecov.io/gh/maleadt/LLVM.jl/src/752aae1a6f86d17877633c6cea70a529a1ad05a4/src/transform.jl -- it has line 82 marked as executed code, the rest as code that was not executed. How can that be? Perhaps because it's inside an @eval block?

How can that be? Perhaps because it's inside an @eval block?

That global code is executed during precompilation, in a different process, which doesn't get passed the coverage flag. The compiler also doesn't seem ready to handle this. But even then, these kind of codes are typically executed by the interpreter, which doesn't seem to record coverage at all.

@maleadt In #30381, I've configured julia_cmd() to pass more variables (they probably should have been environment variables, but I guess that's not possible now), and hacked around that codegen issue. Someday hopefully our compiler will be able to handle codegen more gracefully

Nice, thanks! The define_transforms function @fingolfin spotted above is now also properly covered.

Is there any solution now for julia 1 and 1.1?
Have the same problem with Juniper.jl.
This for example doesn't make sense:
https://codecov.io/gh/lanl-ansi/Juniper.jl/src/3c2634ed82d430871cef9a292926ee4fcc7cf21c/src/solver.jl#L9...26
Is inline=no implemented for julia v1? If yes how can I use it?
Currently running using Pkg; Pkg.clone(pwd()); Pkg.test("Juniper", coverage=true)

I haven't understood the extent of this problem until recently but this seems like Real Bad (TM)

Can this get a milestone?

What milestone and why?

Any milestone (I'm guessing it would have to be 2.0 because if I'm understanding correctly, the solution is to update LineNumberNode to Vector{LineNumberNode}) because code coverage is a critical "sanity check" for testing and writing stable/functional code.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

omus picture omus  Â·  3Comments

StefanKarpinski picture StefanKarpinski  Â·  3Comments

wilburtownsend picture wilburtownsend  Â·  3Comments

sbromberger picture sbromberger  Â·  3Comments

yurivish picture yurivish  Â·  3Comments