Node: Investigate flaky test-cli-node-options

Created on 14 Dec 2018 · 30Comments · Source: nodejs/node

test-cli-node-options has been failing a lot on arm lately in CI. I assume it's the bug reported in https://github.com/nodejs/node/issues/21383 ("make test: use after free: parallel/test-cli-node-options").

Sample failure: https://ci.nodejs.org/job/node-test-commit-arm/20786/nodes=ubuntu1604-arm64/consoleText

Host: test-packetnet-ubuntu1604-arm64-2

not ok 184 parallel/test-cli-node-options
  ---
  duration_ms: 1.266
  severity: fail
  exitcode: 1
  stack: |-
    assert.js:753
        throw newErr;
        ^

    AssertionError [ERR_ASSERTION]: ifError got unwanted exception: Command failed: /home/iojs/build/workspace/node-test-commit-arm/nodes/ubuntu1604-arm64/out/Release/node -e console.log("B")


    #
    # Fatal error in , line 0
    # Check failed: (perf_output_handle_) != nullptr.
    #
    #
    #
    #FailureMessage Object: 0xffffe553e558
        at ChildProcess.exithandler (child_process.js:294:12)
        at ChildProcess.emit (events.js:189:13)
        at maybeClose (internal/child_process.js:978:16)
        at Socket.stream.socket.on (internal/child_process.js:396:11)
        at Socket.emit (events.js:189:13)
        at Pipe._handle.close (net.js:612:12)
  ...

C++ CI / flaky test arm

Source

Trott

Most helpful comment

Fwiw, I’m currently trying to debug this on the machine itself, and the test also fails rather frequently when it’s being run as a standalone script, without other tests being run at the same time.

addaleax on 3 Apr 2019

❤3

All 30 comments

And again. Probably time to mark this as flaky. Will open a PR.

test-packetnet-ubuntu1604-arm64-2

https://ci.nodejs.org/job/node-test-commit-arm/20790/nodes=ubuntu1604-arm64/consoleText

not ok 175 parallel/test-cli-node-options
  ---
  duration_ms: 0.986
  severity: fail
  exitcode: 1
  stack: |-
    assert.js:753
        throw newErr;
        ^

    AssertionError [ERR_ASSERTION]: ifError got unwanted exception: Command failed: /home/iojs/build/workspace/node-test-commit-arm/nodes/ubuntu1604-arm64/out/Release/node -e console.log("B")


    #
    # Fatal error in , line 0
    # Check failed: (perf_output_handle_) != nullptr.
    #
    #
    #
    #FailureMessage Object: 0xffffcb5fc4d8
        at ChildProcess.exithandler (child_process.js:294:12)
        at ChildProcess.emit (events.js:189:13)
        at maybeClose (internal/child_process.js:978:16)
        at Socket.stream.socket.on (internal/child_process.js:396:11)
        at Socket.emit (events.js:189:13)
        at Pipe._handle.close (net.js:612:12)
  ...

Trott on 14 Dec 2018

easily recreated if you run this a hundred times in freebsd

node --perf-basic-prof -e 'console.log("B")'

#
# Fatal error in , line 0
# Check failed: (perf_output_handle_) != nullptr.
#
#
#

(gdb) where
#0  0x00000000015a8eb2 in v8::base::OS::Abort ()
#1  0x0000000000f79516 in v8::internal::PerfBasicLogger::PerfBasicLogger ()
#2  0x0000000000f7fb92 in v8::internal::Logger::SetUp ()
#3  0x0000000000f5b25c in v8::internal::Isolate::Init ()
#4  0x000000000117dd3f in v8::internal::Snapshot::Initialize ()
#5  0x0000000000ace03b in v8::Isolate::Initialize ()
#6  0x00000000008fba32 in node::NewIsolate ()
#7  0x00000000008fdc88 in node::Start ()
#8  0x00000000008fbfdb in node::Start ()
#9  0x00000000008ae095 in _start ()
#10 0x00000008022f0000 in ?? ()
#11 0x0000000000000000 in ?? ()
(gdb)

this corresponds to:

https://github.com/nodejs/node/blob/a6f69ebc05f4033b012b523661f6c3f62f3469b1/deps/v8/src/log.cc#L294-L296

truss output showed this:

24134: open("/tmp/perf-24134.map",O_WRONLY|O_CREAT|O_TRUNC,0666) ERR#13 'Permission denied'


#
# Fatal error in , line 0
# 24134: write(2,"\n\n#\n# Fatal error in , line 0"...,32) = 32 (0x20)
Check failed: (perf_output_handle_) != nullptr.24134: write(2,"Check failed: (perf_output_handl"...,47) =
 47 (0x2f)

#
#
#
#FailureMessage Object: 0x7fffffffdc4024134: write(2,"\n#\n#\n#\n#FailureMessage Objec"...,45) = 45 (0x2d
)

$ l /tmp/perf-24134.map
-rw-r--r--  1 iojs  wheel  68110 May 14  2018 /tmp/perf-24134.map
$ whoami
freebsd
$

so this is basically a temporary name collision. Unlikely to happen on production. One of:

clean up the temp folder prior to runs (test / test harness)
only one type of user run things
retry on failure such as EACCES (PerfBasicLogger::PerfBasicLogger)