Jest: Investigate a new coverage system

Created on 1 Sep 2017  路  10Comments  路  Source: facebook/jest

For large scale projects, instrumenting coverage for all files becomes quite slow; even more if you tell Jest to instrument **/*.js or other similar patterns, so that you can get non-covered files as well.

By default, Jest only instruments required files, which means that if a file is not required at all because it has no tests, it will not show up on the list, thus giving fake percentages on coverage.

Most helpful comment

from my experience, people use coverage completely different depending on the project they work on.
for library code/shared code/utility code branch/statement/function coverage is important, and people often aim for 100% coverage (think React, Jest, Immutable kind of projects)

for application code, 100% branch coverage is impossible and just wrong. When you move fast, have millions of lines of app code, build features and deprecate them next month, coverage is not something that you'll be using to measure your quality. It can give some signal, that is helpful, but coverage by line is more than enough there.

there's another case that involves webdriver tests. In this case we only want a 'per file' coverage, to know which files were executed during the webdriver test run (so we can retrigger the test next time there's any change in the corresponding file)

based on this i think we should have a 3-level coverage:

  1. current coverage (branches/statements/lines/functions)
  2. lines only
  3. files only

All 10 comments

I would recommend changing the coverage reporter to have an optional fast-path to generate the approximate coverage info (with uncovered vs. uncovered/not executable). What do you think?

I'd rather try fixing the root problem. The approximated coverage just does that for non-covered files, so if there is a lot of files being required, the problem will persist. Based on the fact that we require per-line coverage, a simpler, faster instrumentation might be able to be done.

from my experience, people use coverage completely different depending on the project they work on.
for library code/shared code/utility code branch/statement/function coverage is important, and people often aim for 100% coverage (think React, Jest, Immutable kind of projects)

for application code, 100% branch coverage is impossible and just wrong. When you move fast, have millions of lines of app code, build features and deprecate them next month, coverage is not something that you'll be using to measure your quality. It can give some signal, that is helpful, but coverage by line is more than enough there.

there's another case that involves webdriver tests. In this case we only want a 'per file' coverage, to know which files were executed during the webdriver test run (so we can retrigger the test next time there's any change in the corresponding file)

based on this i think we should have a 3-level coverage:

  1. current coverage (branches/statements/lines/functions)
  2. lines only
  3. files only

^ works for me, although I would probably scrap (3) and leave it up for the user to reduce line coverage (2) to file coverage (3) through a results processor.

i only mentioned 3. because line coverage generation and collection is pretty slow which can timeout already slow webdriver tests.

do you think we can reuse parts of flow for instrumenting files?

I agree with @mjesun and this is a topic I've struggled with more than once across multiple systems at Spotify. The rolling coverage (coverage based on what files are required) that is typical of many systems is simply inaccurate and hard to reason about. But when I setup istanbul (with mocha at the time) to instrument ALL files it would have increased test run times 6x so I did not re-attempt.

My opinion is that line coverage is not going to offer accurate coverage numbers and I would rather know the true coverage than have some percentage that doesn't reflect reality.

Enforcing 100% coverage on a large application, as @aaronabramov says, is probably not likely. But the important part is having consistently accurate numbers. All too often I see coverage changes that are hard to reason about and I wish we could just standardize on actual accurate full coverage.

Actually, just backing up a little bit, I have a suspicion that we are collecting coverage for node_modules (or other irrelevant code) which is why we found coverage collection to be slow. Miguel, could you (internally at FB) show me the list of files we are trying to collect coverage from at FB that showed this to be slow?

I got configured it by using two patterns: **/*.js + !**/node_modules/**. We can revisit that, but I'd say even if we do this, is not going to work for really large repos as we have.

@mjesun might sharing the full list of files with me in an internal paste?

Would #7062 alleviate this?

Was this page helpful?
0 / 5 - 0 ratings