Node: coverage: NODE_V8_COVERAGE for Node's test suite, remaining blockers

Created on 18 Sep 2018 · 29Comments · Source: nodejs/node

I'm in the process of trying to collect coverage for Node.js' own internal libraries, and am running into a few significant problems:

[x] line counts appear to be all over the place; also we don't collect coverage for the wrap itself on several internal modules. Here's output from assert.js as an example.
- [x] with https://github.com/nodejs/node/pull/23941, we are now collecting coverage for the top level of internal modules.
- [x] it's still somewhat difficult to figure out the initial offset of files, this is being worked on in (https://github.com/nodejs/node/pull/23837, and https://github.com/nodejs/node/pull/21573).
- [x] ~coverage can't currently be collected or functions with return statements~ coverage isn't collected for closing braces in if statements, see: https://bugs.chromium.org/p/v8/issues/detail?id=8381 (@hashseed, @schuay).
  - [ ] coverage isn't collected for throw statement.
[x] the performance of merging 2800 reports is atrocious, I think this is partially because files like loader.js need to get merged over and over again, and also have many 1000s of blocks.
- @demurgos' work has significantly improved merging performance.
[x] I'm seeing a few failures for tests that instrument code used by coverage itself; specifically tests that mock the events module.

would love some help getting unblocked from these two issues, very excited to move our own test suite towards built in coverage.

CC: @addaleax, @demurgos, @hashseed, @TimothyGu, @schuay

Version: >v10.10.0
Platform :all platforms
Subsystem: test

coverage test

Source

bcoe

❤7

Most helpful comment

I'm not sure which pull request landed to address the issue (@mhdawson) _but_, it seems like we no longer have issues related to the script wrapper \o/

There still seem to be issues with the terminal } in if statements, but I think that this patch should address the issue -- _not quite sure why it's not cropping up for functions now._

There are still some tests that fail under coverage, which we could work on over time, but @mhdawson indicates that this was the case with the existing approach as well.

bcoe on 21 Dec 2018

👍2

All 29 comments

For point 2 (merging coverage reports), one possibility would be to make V8's behavior here configurable and add a mode that skips all of the attempted reductions of reported coverage. Another alternative may be to disable these entirely.

Do you have more details about 1 and 3? Is the coverage reported by V8 incorrect?

schuay on 18 Sep 2018

You could definitely play around with how V8's prepares the data, e.g. remove the calls to merging ranges.

hashseed on 18 Sep 2018

@hashseed @schuay I could get V8 up and running on my machine again and take a stab, how might we configure this setting. My concern would be screwing with the format that Chromium expects in inspector.

bcoe on 21 Sep 2018

Well we could make this a flag in either V8 or Devtools protocol so that it does affect Devtools' use case.

hashseed on 21 Sep 2018

@hashseed correct me if I'm wrong, but if we didn't collapse together ranges, you'd have every range in every processes' output, regardless of whether each block/function is executed. So, merging would become as simple as walking through the two files and adding them together in unison?

bcoe on 21 Sep 2018

Not sure I understand fully what you mean. We employ a bunch of passes to reduce the ranges we need to pass through the protocol. If these passes do not run, you would get ranges for every block and every function, but some would nest inside others.

hashseed on 21 Sep 2018

@hashseed merging two reports in Istanbul is very simple, because you know you'll have the same lines, blocks, and functions:

FileCoverage.prototype.merge = function (other) {
    var that = this;
    Object.keys(other.s).forEach(function (k) {
        that.data.s[k] += other.s[k];
    });
    Object.keys(other.f).forEach(function (k) {
        that.data.f[k] += other.f[k];
    });
    Object.keys(other.b).forEach(function (k) {
        var i,
            retArray = that.data.b[k],
            secondArray = other.b[k];
        if (!retArray) {
            that.data.b[k] = secondArray;
            return;
        }
        for (i = 0; i < retArray.length; i += 1) {
            retArray[i] += secondArray[i];
        }
    });
};

https://github.com/istanbuljs/istanbuljs/blob/master/packages/istanbul-lib-coverage/lib/file.js#L248

I think potentially we'd get to the same spot with the V8 reports if we didn't reduce ranges?

bcoe on 21 Sep 2018

Yes. If you don't reduce the ranges in here we should get ranges that you could overlay on top of each other from different runs.

hashseed on 24 Sep 2018

I submitted a fix for the performance issues related to merging. With this fix, parsing the JSON files for Node and npm takes twice as long as merging. It's important to keep the file sizes down. If disabling post-processing increases the file sizes, it will reduce the overall performance.

The only thing that bothers me right now is that it relies on details of the JS grammar that are not enforced by the the V8 profiler. The issue revolves around choosing how to merge partially overlapping coverage ranges. There's a need to split one of the two ranges. As a result you get either two ranges with the same endOffset or two ranges with the same startOffset. I use the variant with the same endOffset ("left bias") because I consider two ranges with the same startOffset to be a bug. The only cases I found were either bugs or questionable (reported here). I'd prefer V8 to guarantee that it never produces such ranges (add this check to their test suite) instead of relying on the details of the JS grammar. Alternatively, I'd like to find a case proving me wrong: it would mean that my approach is flawed and that you need to disable post-processing or use the AST.

demurgos on 10 Oct 2018

Out of curiosity, what are realistic absolute JSON parse & merge times (and related source file size) for larger workloads?

schuay on 10 Oct 2018

On my computer, the npm cli takes around 1.5 seconds to be loaded from disk into buffers, 7 seconds to be parsed and about 3.5 seconds to be merged. You can clone this repo and use node ./tools/run-rough-benchmarks.js and node ./tools/run-benchmarks.js (you probably first need npm install && cd ts && npm install). The numbers are similar for the Node codebase (the files are smaller but there are more of them).

File sizes (V8 coverage):
npm: 1045 files, 746Mo
node: 3550 files, 502Mo

While merging the files, I do many comparisons between ranges. About 1% of those detect ranges with the same start offset (so it's already pretty rare).

demurgos on 10 Oct 2018

Interesting. Intuitively it should be possible to guarantee all ranges have a unique start position.

But I wonder whether deterministic reported ranges (i.e. irrespective of the actual execution trace) wouldn't give you more benefits through a vastly simplified merge algorithm and only slightly larger JSON files.

schuay on 10 Oct 2018

I am not against having an option to disable post processing: it would be good to compare the results. If we have the guarantee that the ranges do not depend on the runtime (same code block -> same ranges), the merge time can probably be reduced: once you matched the functions, all you have to do is zip the ranges and sum their counts (no splits or nesting to handle).

Edit: I forgot to mention something. In practice, merge times will be even shorter. For the benchmarks, I merge everything, but in reality is there's a step between parsing and merging to filter out all the irrelevant scripts (node_modules, etc.).

demurgos on 10 Oct 2018

I generated coverage for Node with this patch. There are no more ranges with the same startOffset in the output.

demurgos on 10 Oct 2018

Great news! Please let me know if you run into any other issues.

schuay on 11 Oct 2018

@schuay @hashseed here's the reproduction I promised (like two weeks ago at this point sorry, demonstrating the strange output I'm seeing for coverage from Node.js itself:

https://github.com/bcoe/node-coverage-debug

the top level coverage seems to be missing for many built-in modules, e.g., lib/fs.js.
I've included screen shots, you can also clone the node-coverage-debug repo and open the HTML files.
the JSON files in the root directory are the output from the inspector.

the odd thing i we're getting _some_ coverage for the file in question, just not the top level information it would appear.

bcoe on 24 Oct 2018

@bcoe we see some test failures in the existing coverage generation that drives benchmarking.nodejs.org. It's not ideal but just to say it might not be a blocker depending on the failures.

mhdawson on 24 Oct 2018

@mhdawson good news, I seem to have addressed most of the failing tests here:

https://github.com/nodejs/node/pull/23941

@ryzokuken I've added reference to your two tickets, which I'm patiently awaiting and will help standardize offsets.

@schuay, @hashseed I've opened https://bugs.chromium.org/p/v8/issues/detail?id=8381 which relates to some offset issues we've talked about; I'd love to take this task on, once we decide on an approach ... and I believe @iansu wanted to learn about contributing to V8, so I can perhaps work on the task with him.

bcoe on 28 Oct 2018

@bcoe Yes, I'm definitely still interested in working on this with you.

iansu on 29 Oct 2018

🎉1

this will hopefully address the issues with return statements: https://chromium-review.googlesource.com/c/v8/v8/+/1339119

bcoe on 16 Nov 2018

🎉1

Hi guys. I've read this thread but still in doubt. This example https://nodejs.org/api/inspector.html#inspector_cpu_profiler gives wrong lineNumbers and columnNumbers. Does this issue (this thread) reflects this problem as well? Or should I create new issue for that?
@bcoe are you aware of the problem I mentioned? Is there any workaround or something.. BTW Profiler's methods (bestEffort and preciseCoverage) give wrong ranges as well.

anurbol on 17 Nov 2018

@anurbol until https://github.com/nodejs/node/pull/23837 and https://github.com/nodejs/node/pull/21573 land, you need to take into account the size of require('module').wrapper[0].

This logic is implemented in c8.

bcoe on 17 Nov 2018

@bcoe Thank you so much for the response! Unfortunately even when taking into account that line (28 characters and one line, right?), lineNumbers and columnNumbers do not match actual code. It feels like node does something else with code, like reformatting or inserting some other code.

I also failed to find the workaround in c8, can you please point me to the exact file and line? I would be sooo happy if I got this working!

anurbol on 17 Nov 2018

@anurbol could you perhaps bring this conversation to an issue in c8, and provide a reproduction? There are many other potential issues with test coverage, if you're using TypeScript, as an example, you will need to apply source-maps before the output matches your expectations.

bcoe on 17 Nov 2018

@bcoe well, I do use typescript and sourcemaps and encounter errors with them as well. However even very simple, bare-js cases seems to not work properly. Made an issue. Any help is extremely appreciated! And I am still interested in the workaround you mentioned, that is implemented in c8 (but I failed to find it).

anurbol on 17 Nov 2018

@bcoe Since #23941 landed, can we at least check off one more box in the description here?

Trott on 21 Nov 2018

@Trott apologize for the slow reply, this checks off:

line counts appear to be all over the place; also we don't collect coverage for the wrap itself on several internal modules. Here's output from assert.js as an example.

bcoe on 2 Dec 2018

I'm not sure which pull request landed to address the issue (@mhdawson) _but_, it seems like we no longer have issues related to the script wrapper \o/

There still seem to be issues with the terminal } in if statements, but I think that this patch should address the issue -- _not quite sure why it's not cropping up for functions now._

There are still some tests that fail under coverage, which we could work on over time, but @mhdawson indicates that this was the case with the existing approach as well.

bcoe on 21 Dec 2018

👍2

I believe this is resolved as we switched to the native coverage report.

Please reopen if I am mistaken.