Node: Investigate flaky test/async-hooks/test-callback-error

Created on 7 Jun 2017  ·  66Comments  ·  Source: nodejs/node

  • Version: v9.0.0-pre
  • Platform: osx1010
  • Subsystem: test, async_hooks, child_process

https://ci.nodejs.org/job/node-test-commit-osx/10336/nodes=osx1010/console

not ok 167 async-hooks/test-callback-error
  ---
  duration_ms: 60.132
  severity: fail
  stack: |-
    timeout
CI / flaky test macos post-mortem test

Most helpful comment

This has been fixed since @rvagg ran these two commands on the host:

launchctl unload -w /System/Library/LaunchAgents/com.apple.ReportCrash.plist
sudo launchctl unload -w /System/Library/LaunchDaemons/com.apple.ReportCrash.Root.plist

Note that they need to be run from a GUI login or else they error out.

There's an issue open to get this into the provisioning for OS X machines. But for now, the problem is resolved, so I'm going to close this.

All 66 comments

I think we should make the test more robust by only running the spawnSync part if process.argv[2] is not set. The way it is currently written, If there is a bug it will cause a recursion resulting in a timeout.

Happened again today:

https://ci.nodejs.org/job/node-test-commit-osx/10360/nodes=osx1010/console

not ok 167 async-hooks/test-callback-error
  ---
  duration_ms: 60.68
  severity: fail
  stack: |-
    timeout

/cc @nodejs/async_hooks

It might be good to split the three test cases in that file into three separate tests/test files so that we can see which one is the one that's timing out. (I'm pretty sure it's test_callback_abort.)

could add console.log('staring X'/'stopping X')
Cooking now.

So finally I have data, seems like it is test_callback_abort. Maybe the time to create a coredump? anyway it fluctuates from 19s to over 60s. @AndreasMadsen what can we do? skip this case on macOS?

+ python tools/test.py -p tap --mode=release --timeout 35 async-hooks/test-callback-error
TAP version 13
1..1
not ok 1 async-hooks/test-callback-error
  ---
  duration_ms: 31.297
  severity: fail
  stack: |-
    2017-06-09T04:03:20.423Z init
    2017-06-09T04:03:20.457Z required
    2017-06-09T04:03:20.467Z required ./init-hooks
    2017-06-09T04:03:20.467Z staring test_init_callback
    2017-06-09T04:03:20.547Z finished test_init_callback
    2017-06-09T04:03:20.548Z staring test_callback
    2017-06-09T04:03:20.626Z finished test_callback
    2017-06-09T04:03:20.626Z staring test_callback_abort
    2017-06-09T04:03:51.210Z finished test_callback_abort
    assert.js:586
    assert.ifError = function ifError(err) { if (err) throw err; };
                                                      ^

    Error: spawnSync /Users/iojs/build/workspace/node-stress-single-test/nodes/osx1010/out/Release/node ETIMEDOUT
        at exports._errnoException (util.js:1012:11)
        at spawnSync (child_process.js:546:20)
        at Object.<anonymous> (/Users/iojs/build/workspace/node-stress-single-test/nodes/osx1010/test/async-hooks/test-callback-error.js:56:12)
        at Module._compile (module.js:569:30)
        at Object.Module._extensions..js (module.js:580:10)
        at Module.load (module.js:503:32)
        at tryModuleLoad (module.js:466:12)
        at Function.Module._load (module.js:458:3)
        at Function.Module.runMain (module.js:605:10)
        at startup (bootstrap_node.js:158:16)
  ...
+ NOK=1
+ echo '1   OK: 0   NOT OK: 1   TOTAL: 100'
1   OK: 0   NOT OK: 1   TOTAL: 100
+ for i in '`seq $RUN_TIMES`'
+ python tools/test.py -p tap --mode=release --timeout 35 async-hooks/test-callback-error
TAP version 13
1..1
ok 1 async-hooks/test-callback-error
  ---
  duration_ms: 19.5
  ...
+ OK=1
+ echo '2   OK: 1   NOT OK: 1   TOTAL: 100'
2   OK: 1   NOT OK: 1   TOTAL: 100
+ for i in '`seq $RUN_TIMES`'
+ python tools/test.py -p tap --mode=release --timeout 35 async-hooks/test-callback-error
TAP version 13
1..1
ok 1 async-hooks/test-callback-error
  ---
  duration_ms: 18.279
  ...
+ OK=2
+ echo '3   OK: 2   NOT OK: 1   TOTAL: 100'
3   OK: 2   NOT OK: 1   TOTAL: 100
+ for i in '`seq $RUN_TIMES`'
+ python tools/test.py -p tap --mode=release --timeout 35 async-hooks/test-callback-error
TAP version 13
1..1
ok 1 async-hooks/test-callback-error
  ---
  duration_ms: 17.398
  ...
+ OK=3
+ echo '4   OK: 3   NOT OK: 1   TOTAL: 100'
4   OK: 3   NOT OK: 1   TOTAL: 100

/cc @DavidCai1993

Skip this case on macOS?

We don't have a good reason to expect spawnSync to timeout on MacOS. Also, we do we get an ETIMEDOUT now, if it is not a recursion then I would expect the same error as before.

-1 on skip this case on macOS at this time, too. Seems that the problem is somehow related to process.abort() ...?

We don't have a good reason to expect spawnSync to timeout on MacOS. Also, we do we get an ETIMEDOUT now, if it is not a recursion then I would expect the same error as before.

I added a timeout on spawnSync

const c3 = spawnSync(process.execPath, a, { timeout: 30000 });
console.log((new Date()).toISOString() + ' finished test_callback_abort');
assert.ifError(c3.error);

and ifError so we'd see if and why, the call failed.
I could add:

if (common.isOSX && c3.error && c3.error.code === 'ETIMEDOUT') return common.skip()

Or simply return (no skip) so we acknowledge this might fail, but essentially that's ok.

I ran this test 1000 times on my Mac, all tests finished within 200-400 ms.

Thanks for explaining the ETIMEOUT. But I still think, we don't have a good reason to expect spawnSync to timeout on MacOS.

does duration_ms: 31.297 mean 31 seconds? That is confusing.

@Trott why does the abort tests not run on the CI? At least that is what the test/README.md says.

I think the idea was to avoid possibly creating a bunch of core files.

I see, then maybe we should move the test to that directory.

does duration_ms: 31.297 mean 31 seconds? That is confusing.

Yes, that's what it means, and yes it's confusing, but duration_ms is TAP format, not something we determined. So we may be kinda sorta stuck with it (unless/until TAP changes and the Jenkins plugin changes too).

I see, then maybe we should move the test to that directory.

I'd agree with that. Curious if we ever run the abort tests, like maybe before a release or something? @nodejs/release

Oh, by the way, another one on CI today: https://ci.nodejs.org/job/node-test-commit-osx/10391/nodes=osx1010/console

not ok 175 async-hooks/test-callback-error
  ---
  duration_ms: 60.25
  severity: fail
  stack: |-
    timeout

I'd agree with that.

Or at least moving the abort-on-uncaught-exception thing into its own test? I don't know what I think anymore.

Looks like what we have here may be a bug in Node.js master branch on macOS and not a problem with the test.

With this one-line test file called test.js:

throw new Error('foo');

If I run it with current master:

$ ./node --abort-on-uncaught-exception test.js
Uncaught Error: foo

FROM
Object.<anonymous> (/Users/trott/io.js/test.js:1:1)
Module._compile (module.js:1:1)
Object.Module._extensions..js (module.js:1:1)
Module.load (module.js:1:1)
tryModuleLoad (module.js:1:1)
Function.Module._load (module.js:1:1)
Function.Module.runMain (module.js:1:1)
startup (bootstrap_node.js:1:1)
bootstrap_node.js:1:1
Received signal 4 <unknown> 000100c84632

==== C stack trace ===============================

 [0x000100c83442]
 [0x7fff8200652a]
 [0x000000000000]
 [0x000100954b4b]
 [0x0db5628843fd]
 [0x0db562987c7a]
 [0x0db56293be7c]
[end of stack trace]
Illegal instruction: 4
$

There is a looooong pause between printing [end of stack trace] and Illegal instruction: 4.

If I use Node.js 8.1.0, I get similar output, but there's no long pause.

Can someone try to replicate that just to confirm it's not some oddity in my own setup?

@Trott I cannot reproduce it on macOS with neither the master branch, nor 8.1.0. It just aborts immediately.

@aqrln Do you get the Illegal instruction: 4 as the last line of your output? (If I run as root, I don't get that line and it exits immediately.)

I can't reproduce on master, this is what I'm getting:

Uncaught Error: foo

FROM
Object.<anonymous> (/Users/Andreas/Sites/node/node-master/test.js:1:1)
Module._compile (module.js:1:1)
Object.Module._extensions..js (module.js:1:1)
Module.load (module.js:1:1)
tryModuleLoad (module.js:1:1)
Function.Module._load (module.js:1:1)
Function.Module.runMain (module.js:1:1)
startup (bootstrap_node.js:1:1)
bootstrap_node.js:1:1
Received signal 4 <unknown> 000100c3eff2

==== C stack trace ===============================

 [0x000100c3de24]
 [0x7fff8d423bba]
 [0x7fff5fbfdea0]
 [0x0001009205cb]
 [0x0608333043fd]
 [0x060833407c7a]
 [0x0608333bbe7c]
[end of stack trace]
[1]    44955 illegal hardware instruction  ./node --abort-on-uncaught-exception test.js

@Trott

Uncaught Error: foo

FROM
Object.<anonymous> (/Users/alex/projects/aqrln/node/test.js:1:1)
Module._compile (module.js:1:1)
Object.Module._extensions..js (module.js:1:1)
Module.load (module.js:1:1)
tryModuleLoad (module.js:1:1)
Function.Module._load (module.js:1:1)
Function.Module.runMain (module.js:1:1)
startup (bootstrap_node.js:1:1)
bootstrap_node.js:1:1
Received signal 4 <unknown> 000100c3e022

==== C stack trace ===============================

 [0x000100c3ce54]
 [0x7fffaedc8b3a]
 [0x7fff5fbfd960]
 [0x00010092014b]
 [0x23ac401043fd]
 [0x23ac40207c7a]
 [0x23ac401bbe7c]
[end of stack trace]
[1]    32357 illegal hardware instruction  ./node --abort-on-uncaught-exception test.js

Doesn't take long for me:

./node --abort-on-uncaught-exception test.js  0.04s user 0.74s system 46% cpu 1.669 total

Last line:

  • zsh: [2] 39788 illegal hardware instruction (core dumped) ./node --abort-on-uncaught-exception test.js
  • bash: Illegal instruction: 4 (core dumped)

However the generated cores are 1.3GB, so if your mac has an HDD then that might explain the delay.

➜  node git:(master) ✗ ❯ ls -lh /cores
-r--------  1 gib  admin   1.3G  9 Jun 19:11 core.39856
-r--------  1 gib  admin   1.3G  9 Jun 19:11 core.39868

Would be good to try on test-requireio-osx1010-x64-1 in the Build/CI cluster to see if it behaves like my machine or like (apparently) everyone else's. If it matters (and it might), I'm running OS version 10.11.6. And we're running 10.10 in CI, it appears. So if everyone else is running something newer, maybe that's part of the issue?

I'm running MacOS 10.12.3

Now trying on 10.12.5 and unable to reproduce. Anyone have OS X/macOS 10.11 or 10.10 lying around and can try the test.js thing above?

Finally got into the CI machine. It doesn't delay as long as my 10.11 machine, but there is a tiny delay that doesn't exist on my 10.12 machine. So, uh, inconclusive, I guess? Harumph.

Something changed on the CI machine from:
https://ci.nodejs.org/job/node-stress-single-test/1286/nodes=osx1010/console

TAP version 13
1..1
ok 1 async-hooks/test-callback-error
  ---
  duration_ms: 18.83
  ...
+ OK=22
+ echo '25   OK: 22   NOT OK: 3   TOTAL: 100'
25   OK: 22   NOT OK: 3   TOTAL: 100
+ for i in '`seq $RUN_TIMES`'
+ python tools/test.py -p tap --mode=release --timeout 35 async-hooks/test-callback-error
TAP version 13
1..1
ok 1 async-hooks/test-callback-error
  ---
  duration_ms: 17.609
  ...
+ OK=23
+ echo '26   OK: 23   NOT OK: 3   TOTAL: 100'
26   OK: 23   NOT OK: 3   TOTAL: 100
+ for i in '`seq $RUN_TIMES`'
+ python tools/test.py -p tap --mode=release --timeout 35 async-hooks/test-callback-error

to
https://ci.nodejs.org/job/node-stress-single-test/1287/nodes=osx1010/console

TAP version 13
1..1
ok 1 async-hooks/test-callback-error # skip test_callback_abort timed out
  ---
  duration_ms: 0.319
  ...
+ OK=34
+ echo '34   OK: 34   NOT OK: 0   TOTAL: 100'
34   OK: 34   NOT OK: 0   TOTAL: 100
+ for i in '`seq $RUN_TIMES`'
+ python tools/test.py -p tap --mode=release --timeout 35 async-hooks/test-callback-error
TAP version 13
1..1
ok 1 async-hooks/test-callback-error # skip test_callback_abort timed out
  ---
  duration_ms: 0.318
  ...
+ OK=35
+ echo '35   OK: 35   NOT OK: 0   TOTAL: 100'
35   OK: 35   NOT OK: 0   TOTAL: 100
+ for i in '`seq $RUN_TIMES`'
+ python tools/test.py -p tap --mode=release --timeout 35 async-hooks/test-callback-error

@refack Yikes, yes, that is a change (for the better).

I don't trust unexplained changes :shipit:

It is back 😨

not ok 177 async-hooks/test-callback-error
  ---
  duration_ms: 60.75
  severity: fail
  stack: |-
    timeout

https://ci.nodejs.org/job/node-test-commit-osx/10457/nodes=osx1010/

And now gone again after 45 minutes and three-out-of-three failures. Am I wrong that when it happens, it seems to happen consistently, all the time? Until it stops happening altogether?

Let's land #13559

And now gone again after 45 minutes and three-out-of-three failures. Am I wrong that when it happens, it seems to happen consistently, all the time? Until it stops happening altogether?

There is definitely a correlation.

And now gone again after 45 minutes and three-out-of-three failures. Am I wrong that when it happens, it seems to happen consistently, all the time? Until it stops happening altogether?

There is definitely a correlation.

That may be problematic in terms of running a stress test.

I got a good picture of the all elusive timeout:
https://ci.nodejs.org/job/node-stress-single-test/nodes=osx1010/1303/console
It's a bit longer than the ones above, and it demonstrates the pause between the end of stack (visible), and the last line .*Illegal instruction.* (didn't make it in time).
P.S. internal test timeout set to 15s.

TAP version 13
1..1
not ok 1 async-hooks/test-callback-error
  ---
  duration_ms: 17.732
  severity: fail
  stack: |-
       ==== stdout ====  

       ==== stderr ====  
    Error: test_callback_abort
        at ActivityCollector.initHooks.oninit.common.mustCall (/Users/iojs/build/workspace/node-stress-single-test/nodes/osx1010/test/async-hooks/test-callback-error.js:35:45)
        at ActivityCollector.oninit (/Users/iojs/build/workspace/node-stress-single-test/nodes/osx1010/test/common/index.js:517:15)
        at ActivityCollector._init (/Users/iojs/build/workspace/node-stress-single-test/nodes/osx1010/test/async-hooks/init-hooks.js:170:10)
        at init (async_hooks.js:467:43)
        at Object.emitInitS [as emitInit] (async_hooks.js:337:3)
        at Object.<anonymous> (/Users/iojs/build/workspace/node-stress-single-test/nodes/osx1010/test/async-hooks/test-callback-error.js:37:17)
        at Module._compile (module.js:569:30)
        at Object.Module._extensions..js (module.js:580:10)
        at Module.load (module.js:503:32)
        at tryModuleLoad (module.js:466:12)
     1: node::Abort() [/Users/iojs/build/workspace/node-stress-single-test/nodes/osx1010/out/Release/node]
     2: node::Chdir(v8::FunctionCallbackInfo<v8::Value> const&) [/Users/iojs/build/workspace/node-stress-single-test/nodes/osx1010/out/Release/node]
     3: v8::internal::FunctionCallbackArguments::Call(void (*)(v8::FunctionCallbackInfo<v8::Value> const&)) [/Users/iojs/build/workspace/node-stress-single-test/nodes/osx1010/out/Release/node]
     4: v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<false>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::FunctionTemplateInfo>, v8::internal::Handle<v8::internal::Object>, v8::internal::BuiltinArguments) [/Users/iojs/build/workspace/node-stress-single-test/nodes/osx1010/out/Release/node]
     5: v8::internal::Builtin_Impl_HandleApiCall(v8::internal::BuiltinArguments, v8::internal::Isolate*) [/Users/iojs/build/workspace/node-stress-single-test/nodes/osx1010/out/Release/node]
     6: 0x2a86092043fd
     7: 0x2a86092f7c49
    Received signal 6

    ==== C stack trace ===============================

     [0x000100c4402b]
     [0x7fff8acf9f1a]
     [0x7fff5fbfcf60]
     [0x7fff882a59a3]
     [0x000100ac4e0d]
     [0x000100ac8a2e]
     [0x000100216c92]
     [0x0001002925d0]
     [0x000100291b59]
     [0x2a86092043fd]
     [0x2a86092f7c49]
    [end of stack trace]

    assert.js:60
      throw new errors.AssertionError({
      ^

    AssertionError [ERR_ASSERTION]: Error: spawnSync /Users/iojs/build/workspace/node-stress-single-test/nodes/osx1010/out/Release/node ETIMEDOUT
        at Object.<anonymous> (/Users/iojs/build/workspace/node-stress-single-test/nodes/osx1010/test/async-hooks/test-callback-error.js:75:12)
        at Module._compile (module.js:569:30)
        at Object.Module._extensions..js (module.js:580:10)
        at Module.load (module.js:503:32)
        at tryModuleLoad (module.js:466:12)
        at Function.Module._load (module.js:458:3)
        at Function.Module.runMain (module.js:605:10)
        at startup (bootstrap_node.js:158:16)
        at bootstrap_node.js:575:3
  ...
+ NOK=154
+ echo '159   OK: 5   NOT OK: 154   TOTAL: 200'
159   OK: 5   NOT OK: 154   TOTAL: 200
+ for i in '`seq $RUN_TIMES`'
+ python tools/test.py -p tap --mode=release --timeout 30 async-hooks/test-callback-error

I tried disabling core dumps (sudo sysctl -w kern.coredump=0) but that did not make the delay go away, in case anyone else is having the same (probably naive) thought I just did.

Well... #13559 should stop the CI from going red, but we still don't know what's the root cause for the timeouts

I think we have concluded that neither async_hooks or child_process is at fault, but it has something to do with MacOS and --abort-on-uncaught-exception. I have updated the labels accordingly.

I think we have concluded that neither async_hooks or child_process is at fault, but it has something to do with MacOS and --abort-on-uncaught-exception. I have updated the labels accordingly.

I'm not so sure, don't we have a test for just that (--abort-on-uncaught-exception works, and finishes in reasonable time)? And if not that's a nice first contribution...

I'm not so sure.

The test in https://github.com/nodejs/node/issues/13527#issuecomment-307440797 is pretty conclusive.

don't we have a test for just that (--abort-on-uncaught-exception works, and finishes in reasonable time)? And if not that's a nice first contribution...

We have tests, but they are not executed on the CI because they are the special abort directory. https://github.com/nodejs/node/blob/master/test/abort/test-abort-uncaught-exception.js

@nodejs/platform-macos

Well... #13559 should stop the CI from going red, but we still don't know what's the root cause for the timeouts

If all we want is to stop the CI from going red, maybe we should have a commit that does nothing (or nearly nothing) but update the relevant .status file. That more extensive change appears to have broken make test on macOS. :-(

If all we want is to stop the CI from going red, maybe we should have a commit that does nothing (or nearly nothing) but update the relevant .status file. That more extensive change appears to have broken make test on macOS. :-(

(To be a bit more clear, though: No shame intended! A lot of great work has happened so far and some really important information has been uncovered. These things happen.)

This is interesting, I think:

Using Node.js 8.1.2 installed by nvm:

$ time node --abort-on-uncaught-exception -e 'throw new Error("foo")'
Uncaught Error: foo

FROM
[eval]:1:1
ContextifyScript.Script.runInThisContext (vm.js:44:33)
Object.runInThisContext (vm.js:116:38)
Object.<anonymous> ([eval]-wrapper:6:22)
Module._compile (module.js:569:30)
evalScript (bootstrap_node.js:432:27)
startup (bootstrap_node.js:139:9)
bootstrap_node.js:575:3
Illegal instruction: 4

real    0m0.222s
user    0m0.034s
sys 0m0.011s

222 ms execution time! No problem!

Same thing with Node.js compiled from master:

$ time ./node --abort-on-uncaught-exception -e 'throw new Error("foo")'
Uncaught Error: foo

FROM
[eval]:1:1
ContextifyScript.Script.runInThisContext (vm.js:1:1)
Object.runInThisContext (vm.js:1:1)
Object.<anonymous> ([eval]-wrapper:1:1)
Module._compile (module.js:1:1)
evalScript (bootstrap_node.js:1:1)
startup (bootstrap_node.js:1:1)
bootstrap_node.js:1:1
Received signal 4 <unknown> 000100c87282

==== C stack trace ===============================

 [0x000100c86092]
 [0x7fff8200652a]
 [0x000000000000]
 [0x000100954a1b]
 [0x05ef3e4843fd]
[end of stack trace]
Illegal instruction: 4

real    0m12.560s
user    0m0.034s
sys 0m0.012s

More than 12.5 seconds execution time. Yikes!

Interesting that the stack traces are so very different....

I did a git checkout upstream/v8.x-staging && git checkout a021720 && ./configure && make -j4 to create 8.1.2, but it still takes a long time to exit:

$ ./node -v && time ./node --abort-on-uncaught-exception -e 'throw new Error("foo")'
v8.1.2
Uncaught Error: foo

FROM
[eval]:1:1
ContextifyScript.Script.runInThisContext (vm.js:44:33)
Object.runInThisContext (vm.js:116:38)
Object.<anonymous> ([eval]-wrapper:6:22)
Module._compile (module.js:569:30)
evalScript (bootstrap_node.js:432:27)
startup (bootstrap_node.js:139:9)
bootstrap_node.js:575:3
Illegal instruction: 4

real    0m12.163s
user    0m0.035s
sys 0m0.011s

Now the stack traces match at least, but the time difference is certainly troubling.

When run as root, there is no delay and there is no "Illegal instruction: 4".

I think the issue is in V8 (or perhaps more precisely, an interaction of macOS with V8 code--not sure if the bug is in one place, the other, or really neither). If I don't pass --abort-on-uncaught-exception on to V8, I don't get the ~12 second delay on exit. @nodejs/v8

cc @gsathya @ajklein

If I'm reading the history correctly, this seems like a bug with --abort-on-uncaught-exception and not async-hooks. Correct?

Re-routing to @schuay @fhinkel

If I'm reading the history correctly, this seems like a bug with --abort-on-uncaught-exception and not async-hooks. Correct?

Yes. Although I now have the confounding additional information that the issue manifests itself with only some users on my laptop. If I switch to other users, the issue goes away. It's entirely possible the problem is entirely my setup, but I suspect it's not quite that simple because we're seeing this in CI as well.

OK, it's almost definitely user setup at this point. When I run the program, while it's hanging, a process named ReportCrash goes way up to 99% CPU. Main thing at this point is probably to:

  • figure out how to disable it or give it the permissions it needs or whatever, then...
  • ..fix the issue on CI, then...
  • ...document the issue somewhere so Collaborators like me and @benjamingr who run into the issue know what to do :-D

Although...if I run as root, things work just fine, no delay, etc. So maybe the real issue is to figure that out rather than disabling crash reporting... ¯\_(ツ)_/¯

Again, I wouldn't be going on and on about problems with my own personal setup if it weren't for the fact that we're seeing it in CI too, and we need to fix it there....

I wonder what the node::Chdir is about in the stack trace...

Logging on to the CI machine, I don't get the long delay with simple cases like that described in
https://github.com/nodejs/node/issues/13527#issuecomment-307440797. But I do get a long delay if I do what the test does from the command line:

test-nodesource-osx1010-x64-1:osx1010 iojs$ time ./node --abort-on-uncaught-exception test/async-hooks/test-callback-error.js test_callback_abort
Error: test_callback_abort
    at ActivityCollector.initHooks.oninit.common.mustCall (/Users/iojs/build/workspace/node-stress-single-test/nodes/osx1010/test/async-hooks/test-callback-error.js:30:45)
    at ActivityCollector.oninit (/Users/iojs/build/workspace/node-stress-single-test/nodes/osx1010/test/common/index.js:517:15)
    at ActivityCollector._init (/Users/iojs/build/workspace/node-stress-single-test/nodes/osx1010/test/async-hooks/init-hooks.js:170:10)
    at init (async_hooks.js:467:43)
    at Object.emitInitS [as emitInit] (async_hooks.js:337:3)
    at Object.<anonymous> (/Users/iojs/build/workspace/node-stress-single-test/nodes/osx1010/test/async-hooks/test-callback-error.js:33:17)
    at Module._compile (module.js:569:30)
    at Object.Module._extensions..js (module.js:580:10)
    at Module.load (module.js:503:32)
    at tryModuleLoad (module.js:466:12)
 1: node::Abort() [/Users/iojs/build/workspace/node-stress-single-test/nodes/osx1010/./node]
 2: node::Chdir(v8::FunctionCallbackInfo<v8::Value> const&) [/Users/iojs/build/workspace/node-stress-single-test/nodes/osx1010/./node]
 3: v8::internal::FunctionCallbackArguments::Call(void (*)(v8::FunctionCallbackInfo<v8::Value> const&)) [/Users/iojs/build/workspace/node-stress-single-test/nodes/osx1010/./node]
 4: v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<false>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::FunctionTemplateInfo>, v8::internal::Handle<v8::internal::Object>, v8::internal::BuiltinArguments) [/Users/iojs/build/workspace/node-stress-single-test/nodes/osx1010/./node]
 5: v8::internal::Builtin_Impl_HandleApiCall(v8::internal::BuiltinArguments, v8::internal::Isolate*) [/Users/iojs/build/workspace/node-stress-single-test/nodes/osx1010/./node]
 6: 0x3a7608e043fd
 7: 0x3a7608ef7c49
Received signal 6

==== C stack trace ===============================

 [0x000100c4402b]
 [0x7fff8d42bf1a]
 [0x000000000002]
 [0x7fff8a9d79a3]
 [0x000100ac4e0d]
 [0x000100ac8a2e]
 [0x000100216c92]
 [0x0001002925d0]
 [0x000100291b59]
 [0x3a7608e043fd]
 [0x3a7608ef7c49]
[end of stack trace]

Abort trap: 6

real    2m2.042s
user    0m0.072s
sys 0m0.051s
test-nodesource-osx1010-x64-1:osx1010 iojs$ 

And if I run top -o cpu in another terminal, it shows ReportCrash at the top, unsurprisingly.

I don't have root or sudo access on the machine so I can't see what happens if I run as root and I can't experiment with using launchctl to try to disable ReportCrash. (Running launchctl list without root doesn't show ReportCrash.) @nodejs/build

Assuming you're using test-requireio-osx1010-x64-1, then https://github.com/nodejs/build/pull/753#issuecomment-307750195 is the discussion around getting root.

@rvagg could you put the test key in the rvagg user? Presumably we'd also need the password to run sudo commands with it, IDK how we handle that normally.

...
real    2m2.042s
...

😵

If the plan is to move to MacStadium and not use requireio for Mac testing, then this may all be a moot point, at least in terms of CI. How soon can we move to MacStadium?

Good news! @rvagg reports that disabling ReportCrash with launchctl reduces the time from over two minutes to just 220ms when running time ./node --abort-on-uncaught-exception test/async-hooks/test-callback-error.js test_callback_abort. So this should be enough to turn CI green for OS X. All that's needed is to document this in the ansible scripts that build out the OS X hosts. That's a PR for the build repo.

This has been fixed since @rvagg ran these two commands on the host:

launchctl unload -w /System/Library/LaunchAgents/com.apple.ReportCrash.plist
sudo launchctl unload -w /System/Library/LaunchDaemons/com.apple.ReportCrash.Root.plist

Note that they need to be run from a GUI login or else they error out.

There's an issue open to get this into the provisioning for OS X machines. But for now, the problem is resolved, so I'm going to close this.

Was this page helpful?
0 / 5 - 0 ratings