Stryker: Stryker 4.0.0 beta, strange diff in number of survived when running Stryker on @stryker-mutator/core

Created on 3 Sep 2020  路  11Comments  路  Source: stryker-mutator/stryker

When running Stryker 4.0.0-beta.4 on @stryker-mutator/core itself (without checker), it yields a small difference in nr of survived mutants.

| Stryker version | concurrency | Coverage Analysis | OS | score | killed | timeout | survived | no cov | error | duration |
|---|---|---|---|----|---------|----------|-----------|------------|----------|--|
| 4.0.0-beta.4 | 7 | all | WSL2 | 67.21% | 1861 | 47 | 491 | 440 | 1 | 00:27:20 |
| 4.0.0-beta.4 | 7 | perTest | WSL2 | 67.07% | 1865 | 39 | 495 | 440 | 1 | 00:05:55 |
| 4.0.0-beta.4 | 3 | all | WSL2 | 67.03% | 1863 | 40 | 496 | 440 | 1 | 00:27:55 |
| 4.0.0-beta.4 | 3 | perTest | WSL2 | 66.92% | 1865 | 35 | 499 | 440 | 1 | 00:06:30 |
| 4.0.0-beta.4 | 1 | perTest | WSL2 | 66.89% | 1864 | 35 | 500 | 440 | 1 | 00:14:06 |
| 4.0.0-beta.4 | 1 | all | WSL2 | 66.89% | 1864 | 35 | 500 | 440 | 1 | 01:00:30 |

This needs investigation.

See uploaded files.

stryker-core.zip

馃悰 Bug

Most helpful comment

I'm not sure about express, because I didn't get the feeling that the memory leak was the problem there.

But for Stryker: yes, the same result each time. I didn't do a "stress test" yet to see how high I can go with --concurrency before timeouts occur again. That can also be good to know, because if that is for example --concurrency 8 then I think I wouldn't feel comfortable with a default concurrency of 7 (n - 1 where n is nr of logical cores).

All 11 comments

There seems to be a fairly simple explanation for it. There are 4 more mutants survived and 4 more killed in the perTest scenario. These were probably reported as timeout in the all scenario. Will test again with a lower --concurrency today, just to make sure

Just did the test with concurrency 3 and updated the results above. Timeouts remain inconsistent.

During the "all" with "concurrency 3" run, I got an out-of-memory error. This might explain the inconsistency in timeouts. It might mean that test runner processes are spending more and more time on garbage collection. Needs investigation


Out of memory error

Mutation testing  [=====================================             ] 73% (elapsed: ~17m, remaining: ~6m) 1758/2400 tested (390 survived, 23 timed out)09:40:59 (5861) WARN ChildProcessProxy Child process [pid 6369] exited unexpectedly with exit code null (SIGABRT). Last part of stdout and stderr was:
        18: 0xed6fa0 v8::internal::JSObject::DefineOwnPropertyIgnoreAttributes(v8::internal::LookupIterator*, v8::internal::Handle<v8::internal::Object>, v8::internal::PropertyAttributes, v8::Maybe<v8::internal::ShouldThrow>, v8::internal::JSObject::AccessorInfoHandling) [/home/nicojs/.nvm/versions/node/v12.18.3/bin/node]
        19: 0xed71d9 v8::internal::JSObject::DefineOwnPropertyIgnoreAttributes(v8::internal::LookupIterator*, v8::internal::Handle<v8::internal::Object>, v8::internal::PropertyAttributes, v8::internal::JSObject::AccessorInfoHandling) [/home/nicojs/.nvm/versions/node/v12.18.3/bin/node]
        20: 0xed9ec8 v8::internal::JSReceiver::ValidateAndApplyPropertyDescriptor(v8::internal::Isolate*, v8::internal::LookupIterator*, bool, v8::internal::PropertyDescriptor*, v8::internal::PropertyDescriptor*, v8::Maybe<v8::internal::ShouldThrow>, v8::internal::Handle<v8::internal::Name>) [/home/nicojs/.nvm/versions/node/v12.18.3/bin/node]
        21: 0xeda666 v8::internal::JSReceiver::OrdinaryDefineOwnProperty(v8::internal::LookupIterator*, v8::internal::PropertyDescriptor*, v8::Maybe<v8::internal::ShouldThrow>) [/home/nicojs/.nvm/versions/node/v12.18.3/bin/node]
        22: 0xedab11 v8::internal::JSReceiver::DefineOwnProperty(v8::internal::Isolate*, v8::internal::Handle<v8::internal::JSReceiver>, v8::internal::Handle<v8::internal::Object>, v8::internal::PropertyDescriptor*, v8::Maybe<v8::internal::ShouldThrow>) [/home/nicojs/.nvm/versions/node/v12.18.3/bin/node]
        23: 0xedace0 v8::internal::JSReceiver::DefineProperty(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>) [/home/nicojs/.nvm/versions/node/v12.18.3/bin/node]
        24: 0xc3bd16 v8::internal::Builtin_ObjectDefineProperty(int, unsigned long*, v8::internal::Isolate*) [/home/nicojs/.nvm/versions/node/v12.18.3/bin/node]
        25: 0x13cf119  [/home/nicojs/.nvm/versions/node/v12.18.3/bin/node]


        <--- Last few GCs --->

        [6369:0x2f155e0]   385190 ms: Scavenge 1828.0 (2065.0) -> 1814.2 (2066.5) MB, 14.9 / 0.0 ms  (average mu = 0.384, current mu = 0.408) allocation failure
        [6369:0x2f155e0]   385277 ms: Scavenge 1829.8 (2066.5) -> 1814.6 (2066.5) MB, 11.7 / 0.0 ms  (average mu = 0.384, current mu = 0.408) allocation failure
        [6369:0x2f155e0]   385371 ms: Scavenge 1830.1 (2066.5) -> 1815.5 (2067.0) MB, 13.9 / 0.0 ms  (average mu = 0.384, current mu = 0.408) allocation failure


        <--- JS stacktrace --->

        ==== JS stack trace =========================================

        Security context: 0x3514d43008d1 <JSObject>
            0: builtin exit frame: defineProperty(this=0x3514d4300969 <JSFunction Object (sfi = 0x48e19d8a249)>,0x2527d7fb2739 <Object map = 0x36834d005029>,0x074cfa388f51 <String[#19]: errorsWithCallStack>,0x2527d7fb2219 <JSFunction proxy (sfi = 0x74cfa388ff1)>,0x3514d4300969 <JSFunction Object (sfi = 0x48e19d8a249)>)

            1: wrapFunction(aka wrapFunction) [0x34646b6c7279] [/home/nicojs/stryker/node_mo...

         ChildProcessCrashedError: Child process [pid 6369] exited unexpectedly with exit code null (SIGABRT). Last part of stdout and stderr was:
        18: 0xed6fa0 v8::internal::JSObject::DefineOwnPropertyIgnoreAttributes(v8::internal::LookupIterator*, v8::internal::Handle<v8::internal::Object>, v8::internal::PropertyAttributes, v8::Maybe<v8::internal::ShouldThrow>, v8::internal::JSObject::AccessorInfoHandling) [/home/nicojs/.nvm/versions/node/v12.18.3/bin/node]
        19: 0xed71d9 v8::internal::JSObject::DefineOwnPropertyIgnoreAttributes(v8::internal::LookupIterator*, v8::internal::Handle<v8::internal::Object>, v8::internal::PropertyAttributes, v8::internal::JSObject::AccessorInfoHandling) [/home/nicojs/.nvm/versions/node/v12.18.3/bin/node]
        20: 0xed9ec8 v8::internal::JSReceiver::ValidateAndApplyPropertyDescriptor(v8::internal::Isolate*, v8::internal::LookupIterator*, bool, v8::internal::PropertyDescriptor*, v8::internal::PropertyDescriptor*, v8::Maybe<v8::internal::ShouldThrow>, v8::internal::Handle<v8::internal::Name>) [/home/nicojs/.nvm/versions/node/v12.18.3/bin/node]
        21: 0xeda666 v8::internal::JSReceiver::OrdinaryDefineOwnProperty(v8::internal::LookupIterator*, v8::internal::PropertyDescriptor*, v8::Maybe<v8::internal::ShouldThrow>) [/home/nicojs/.nvm/versions/node/v12.18.3/bin/node]
        22: 0xedab11 v8::internal::JSReceiver::DefineOwnProperty(v8::internal::Isolate*, v8::internal::Handle<v8::internal::JSReceiver>, v8::internal::Handle<v8::internal::Object>, v8::internal::PropertyDescriptor*, v8::Maybe<v8::internal::ShouldThrow>) [/home/nicojs/.nvm/versions/node/v12.18.3/bin/node]
        23: 0xedace0 v8::internal::JSReceiver::DefineProperty(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>) [/home/nicojs/.nvm/versions/node/v12.18.3/bin/node]
        24: 0xc3bd16 v8::internal::Builtin_ObjectDefineProperty(int, unsigned long*, v8::internal::Isolate*) [/home/nicojs/.nvm/versions/node/v12.18.3/bin/node]
        25: 0x13cf119  [/home/nicojs/.nvm/versions/node/v12.18.3/bin/node]


        <--- Last few GCs --->

        [6369:0x2f155e0]   385190 ms: Scavenge 1828.0 (2065.0) -> 1814.2 (2066.5) MB, 14.9 / 0.0 ms  (average mu = 0.384, current mu = 0.408) allocation failure
        [6369:0x2f155e0]   385277 ms: Scavenge 1829.8 (2066.5) -> 1814.6 (2066.5) MB, 11.7 / 0.0 ms  (average mu = 0.384, current mu = 0.408) allocation failure
        [6369:0x2f155e0]   385371 ms: Scavenge 1830.1 (2066.5) -> 1815.5 (2067.0) MB, 13.9 / 0.0 ms  (average mu = 0.384, current mu = 0.408) allocation failure


        <--- JS stacktrace --->

        ==== JS stack trace =========================================

        Security context: 0x3514d43008d1 <JSObject>
            0: builtin exit frame: defineProperty(this=0x3514d4300969 <JSFunction Object (sfi = 0x48e19d8a249)>,0x2527d7fb2739 <Object map = 0x36834d005029>,0x074cfa388f51 <String[#19]: errorsWithCallStack>,0x2527d7fb2219 <JSFunction proxy (sfi = 0x74cfa388ff1)>,0x3514d4300969 <JSFunction Object (sfi = 0x48e19d8a249)>)

            1: wrapFunction(aka wrapFunction) [0x34646b6c7279] [/home/nicojs/stryker/node_mo...


    at ChildProcessProxy.handleUnexpectedExit (/home/nicojs/stryker/packages/core/src/child-proxy/ChildProcessProxy.js:32:37)
    at ChildProcess.emit (events.js:315:20)
    at maybeClose (internal/child_process.js:1021:16)
    at Socket.<anonymous> (internal/child_process.js:443:11)
    at Socket.emit (events.js:315:20)
    at Pipe.<anonymous> (net.js:674:12) {
  innerError: undefined,
  pid: 6369,
  exitCode: null,
  signal: 'SIGABRT'
}
Mutation testing  [===========================================       ] 86% (elapsed: ~22m, remaining: ~3m) 2077/2400 tested (439 survived, 25 timed out)09:45:55 (5861) WARN ChildProcessProxy Child process [pid 6482] ran out of memory. Stdout and stderr are logged on debug level.

@Lakitna I keep coming back to improving the default --concurrency to being n-1 where n is the number of _physical cores_ rather than _logical cores_. As you can see, the performance improvement of using concurrency 7 (default, I have 8 logical cores) or 3 (I have 4 physical cores) is not much. Yet the number of timeouts seems to be larger than nessesary.

For github actions (and other build servers?) this shouldn't matter, since they don't use hyperthreading: https://docs.github.com/en/actions/reference/virtual-environments-for-github-hosted-runners#supported-runners-and-hardware-resources

I agree the difference is not much so it can be a viable strategy. However, it feels like treating the symptoms instead of the disease.

The reason I feel like it's symptom treating is that it's also not stable with concurrency: physicalCoreCount - 1.

It took me way to long to come with this suggestion: We're facing issues in a mutli-threaded thing (It's actually multi-process鈥攕emantic details). Any time an issue arises in such a process the first thought should be race conditions. In this case, it's actually pretty easy to find out if we're dealing with a race condition. We can run with concurrency: 1.

I'll run the Express bench with concurrency: 1. See #2417 for results. @nicojs Maybe you can run with concurrency: 1 for Stryker core?

Sure, good idea! Will run that today. Starting with perTest

EDIT: "perTest" and "all" results are in, see the original post

I just finished the runs.

  • "all" run had 3 out of memory errors.
  • "perTest" run had 2 out of memory errors.

It's pretty clear to me we have a memory error. I think I also spotted the issue:

https://github.com/stryker-mutator/stryker/blob/4975259ac5f3c8bc76dfbdba75b370e59e1670bb/packages/core/src/child-proxy/ChildProcessProxyWorker.ts#L119-L126

We never clean these event listeners. I will change the code and try again.

The logging gave me a hint to look in that file:

        ==== JS stack trace =========================================

            0: ExitFrame [pc: 0x13cf019]
        Security context: 0x035d582c08d1 <JSObject>
            1: extendCommon(aka extendCommon) [0x208eefb9c1f1] [/home/nicojs/stryker/node_modules/sinon/lib/sinon/util/core/extend.js:~54] [pc=0x2b6bd532f77e](this=0x3965ea7804b1 <undefined>,0x3bfc7c0346c9 <JSFunction proxy (sfi = 0x3d39bc5fb8c9)>,0x3bfc7c034ed1 <JSArray[1]>,0x3bfc7c034f09 <JSFunction copyProperty (sfi = 0x2be809445e49)>)
            2: stub [0x28d02c6ea509] [/h...


    at ChildProcessProxy.handleUnexpectedExit (/home/nicojs/stryker/packages/core/src/child-proxy/ChildProcessProxy.js:32:37)

An update: I fixed 2 memory leaks and now Stryker is running again deterministically on its own codebase. Always the same output for --concurrency 7 and --concurrency 1. 35 timeouts.

The biggest memory leak I found in the way we cleared files from the require cache. We never cleared module.children. This bug has been here since the beginning.

See #2468 and #2467 .

cool so that should fix all these inconsistencies? (at least most of em)

That's awesome! I'll run the Express bench a few times to ensure that one is stable now too.

I'm not sure about express, because I didn't get the feeling that the memory leak was the problem there.

But for Stryker: yes, the same result each time. I didn't do a "stress test" yet to see how high I can go with --concurrency before timeouts occur again. That can also be good to know, because if that is for example --concurrency 8 then I think I wouldn't feel comfortable with a default concurrency of 7 (n - 1 where n is nr of logical cores).

Results of the "stress test":

| concurrency | timeout | duration |
|--|--|--|
| 9 | 35 | 4:20 |
| 10 | 44 | 4:26 |
| 11 | 161 | 5:16 |

Remember: 35 is the correct number of timeouts and 7 is the default concurrency for my hardware.

As you can see (and expect), more concurrency might mean that it takes longer. You can also clearly see that the number of timeouts explode when you keep adding processes.

I'm pretty happy with these test results. It seems that we have some leeway factored into our default timeout setting. As long as you don't have a memory leak in your test suite (or Stryker itself 馃槄), you're fine.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

simondel picture simondel  路  17Comments

kmdrGroch picture kmdrGroch  路  19Comments

Lakitna picture Lakitna  路  97Comments

davesag picture davesag  路  17Comments

jeznag picture jeznag  路  17Comments