Stryker: Support incremental analysis

Created on 12 Jun 2017  路  16Comments  路  Source: stryker-mutator/stryker

The mutation testing process could be sped up by only testing changed mutants.

We could argue that a mutant should not have to be tested if:

  • It was killed in the previous run
  • And the test(s) that killed it still exists
  • And the same mutant was generated (same mutator, same file, same columns, same original code, same mutated code)

I would not advise to run incremental analysis during your build because it will probably not be flawless. One of the flaws of my proposed setup is that the test(s) that killed the mutant in the previous run could have changed. Right now we have no good way of knowing they were changed.

help wanted 馃殌 Feature request

Most helpful comment

Dear bot, don't close this, please

All 16 comments

With Karma using Chrome, I am seeing scan times for ~3,500 mutants take ~5 hours. Using PhantomJS decreases scan times by ~50%. Incremental analysis while not perfect would go a long way in improving performance while still killings a majority of the generated mutants.

Yeah, this would be pretty crucial. Stryker works well in my very small apps. My main prod app has ~1k unit tests. I had to reduce runners to 6 to allow it to run, and think after a few hours it was less than 50% done.

You could probably go two ways with this one- just running on changes from that commit, or using a history file like pitest for Java.

Yeah, this would be pretty crucial. Stryker works well in my very small apps. My main prod app has ~1k unit tests. I had to reduce runners to 6 to allow it to run, and think after a few hours it was less than 50% done.

Wow this is cool. Are you using coverage analysis for the running of the tests? If so, is it OK to share some statistics? How much faster is it with/without it. Also #366 could also make a big difference for these kind of projects.

With Karma using Chrome, I am seeing scan times for ~3,500 mutants take ~5 hours. Using PhantomJS decreases scan times by ~50%

What exactly do you mean with "scan times". Would that be the initial test run by any chance?

Another way of solving this is with sharding. So you could create a build per file to be mutated. You could share the same stryker.conf.js file, but choose to mutate different files using an env variable and parameterised build. We could force stryker to merge reports together in a big report. It might also be a good idea to split your project into different modules all together, but that's not up to me ;)

You could probably go two ways with this one- just running on changes from that commit, or using a history file like pitest for Java.

@jmpresso How does it work exactly? Does every developer need to build the history themselves using a full run once? Or is there a centralized store somewhere?

Wow this is cool. Are you using coverage analysis for the running of the tests? If so, is it OK to share some statistics? How much faster is it with/without it. Also #366 could also make a big difference for these kind of projects.

Not sure exactly what you mean there. It was taking so long to run I gave up on letting it finished. So currently I only have the istanbul type line coverage. I think karma runs in <2 minutes by itself, the code coverage is pretty fast too.

I do agree #366 would help as well. The temporary folder was also very large during my run.

What exactly do you mean with "scan times". Would that be the initial test run by any chance?
I don't want to speak for him, but I think he means the total execution time of the mutant testing.

Another way of solving this is with sharding. So you could create a build per file to be mutated. You could share the same stryker.conf.js file, but choose to mutate different files using an env variable and parameterised build. We could force stryker to merge reports together in a big report. It might also be a good idea to split your project into different modules all together, but that's not up to me ;)

I'm not totally sure how much that would help. Are you suggesting sharding it and running the shards in parallel? It might help, but might bog down the build machine. Perhaps we could make it work with SauceLabs. We'd also have to figure out different ports for all the different Karma servers, so it might get pretty messy.

I agree... modularizing absolutely helps, it's just not always possible.

@jmpresso How does it work exactly? Does every developer need to build the history themselves using a full run once? Or is there a centralized store somewhere?

Only the first time using the plugin. From there, it records the results in the history file, which could then gets versioned along with the code. I'm honestly not sure if/how it would handle merges and what not though.

FWIW, it mutates in memory also.

Not sure exactly what you mean there. It was taking so long to run I gave up on letting it finished. So currently I only have the istanbul type line coverage. I think karma runs in <2 minutes by itself, the code coverage is pretty fast too.

Wouw, 2 mins. Not sure there is something we can do in that case. Are there slow tests in the mix? Or just in general a large application footprint?

If you have slow tests, you could try to move them to a separate test suite (for example: "integration tests"). If you have a big application foodprint, it could help to bundle files together. For example: 1 bundle for all tests and one bundle for your production code. It might be that the majority of the time has to do with loading files in the browser. HTTP 1.1 is terribly bad optimized for a big number of files.

Are you suggesting sharding it and running the shards in parallel?

No, running the shards one by one, or on multiple build servers.

Only the first time using the plugin. From there, it records the results in the history file, which could then gets versioned along with the code.

@jmpresso Just to be sure: you wouldn't check in the history file (or files) into source control?

@nicojs Sorry for the confusion, by "scan times", I meant "execution time". Similar to @jmpresso the project I was referring to executes 5,000+ tests in 4 minutes using just karma. However, when executing Stryker with multiple runners, it takes 4+ hours.

I think #366 will help too. However, the biggest benefit seems like it would come from a history file feature like PiTest uses. This way previously generated mutants' results are used for files not modified since the last build. This history file IS committed to source control and referenced in the maven config settings in the pom.xml. This way the CI system can see it during builds and mutants generated in unmodified code are not rechecked again. Developers can keep this file up to date by running PiTest locally, generating a new pitestHistory.txt and committing it to source control.

In the example below if FooClass is not updated on a subsequent commit, PiTest will use its previous results from the pitestHistory.txt file instead of regenerating its mutants and running tests against them.

Example class...

package path.to.src.class;

import path.to.src.class.BarClass;
import path.to.src.class.BazClass;

public class FooClass {
    public BazClass methodFromClass(BarClass barClass) {
        return  Object.doStuff(BarClass)
    }
}

Resulting entry in pitestHistory.txt after running PiTest...

1
<classHistory><id hierarchicalHash="f9bf32"><classId hash="16367410"><name><name>path/to/src/class/FooClass</name></name></classId></id><coverageId>0</coverageId></classHistory>
<result><id><location><clazz><name>path/to/src/class/FooClass</name></clazz><method><name>methodFromClass</name></method><methodDesc>(Lpath/to/src/class/BarClass;)Lpath/to/src/class/BazClass;</methodDesc></location><indexes><int>7</int></indexes><mutator>org.pitest.mutationtest.engine.gregor.mutators.ReturnValsMutator</mutator></id><status numberOfTestsRun="0" status="NO_COVERAGE"><killingTest class="org.pitest.functional.Option$None"/></status></result>

I created a fork of Stryker in order to begin working on this feature. After cloning the Stryker project and running npm install when i try to run npm test in an individual package folder i see test failures due to no tests found (I think this may be due to the fact that the test runner is looking for .js files, but what exists in the different test directories are .ts files).

When i run npm test at the root level i am also seeing some build failures. I've attached the debug log that is produced. Any ideas why i'm unable to run the tests out-of-the-box so to speak?

2018-04-01T15_32_51_572Z-debug.log

@CarlosRosario Thanks for wanting to take a shot at this! Could you try running npm run build from the root of the project first? That should build (and link) every project in the repository. After that you should be able to run npm test for a specific package (or run even run the tests in debug mode if you use VS Code)

@simondel that is what @CarlosRosario did (npm test at the root does an npm run build first).

As far as I can tell it should work. We run this command on linux and windows machines to develop and build stryker. Maybe something was not installed correctly? Can you try clearing your node_modules and try again?

After running npm run build from the root directory you should be able to run npm test from any child package. Just remember to run npm run build in a child package if you change some code.

If you use vscode, you can open a child package in vscode and use F1 + "Run Build Task". It will start the incremental tsc compiler (tsc -w). With the debug pane on the left you can run/debug unit tests or integration tests (located in "test/unit" and "test/integration"). Breakpoints in ts code should work out-of-the box.

Please contact us on gitter if you have any problems or questions.

My bad! I do notice that your fork could be using typescript 2.8, which we don't support yet. Although the error is different then the one we got with typescript 2.8. You could try to clone the core repo instead of your fork to see if you can build/test that.

Hey, any news/plans about this?

I don't think Carlos is working on it any longer. I know one of us made a fork to work on it, and made some progress to at least creating the history file, but then it got put on hold. I think there also is still a lot of work needed to migrate to the new version of Stryker, and finish the implementation. It's possible it will get picked back up at some point, but not sure. I know we have also been pushing to start using faster test runners like Jest over Karma, to try and reduce the need.

TL/DR, unless the Stryker team is working on it, probably not, as I don't think some of us here that have been discussing it have made significant progress on it.

We're not actively working in it. We've also decided to start implementing #1514 since that will give us a lossless performance boost. This might be something we would consider doing after that.

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Dear bot, don't close this, please

Was this page helpful?
0 / 5 - 0 ratings

Related issues

kmdrGroch picture kmdrGroch  路  5Comments

jeznag picture jeznag  路  6Comments

CheshireSwift picture CheshireSwift  路  4Comments

kmdrGroch picture kmdrGroch  路  5Comments

kmdrGroch picture kmdrGroch  路  4Comments