https://ci.nodejs.org/job/node-test-commit-linuxone/857/nodes=rhel72-s390x/consoleFull
23:45:13 not ok 192 parallel/test-child-process-exec-timeout
23:45:13 ---
23:45:13 duration_ms: 0.647
23:45:13 severity: fail
23:45:13 exitcode: 1
23:45:13 stack: |-
23:45:13 assert.js:77
23:45:13 throw new AssertionError(obj);
23:45:13 ^
23:45:13
23:45:13 AssertionError [ERR_ASSERTION]: Input A expected to strictly equal input B:
23:45:13 + expected - actual
23:45:13
23:45:13 - 'SIGSEGV'
23:45:13 + 'SIGTERM'
23:45:13 at cp.exec.common.mustCall (/data/iojs/build/workspace/node-test-commit-linuxone/nodes/rhel72-s390x/test/parallel/test-child-process-exec-timeout.js:34:12)
23:45:13 at /data/iojs/build/workspace/node-test-commit-linuxone/nodes/rhel72-s390x/test/common/index.js:474:15
23:45:13 at ChildProcess.exithandler (child_process.js:289:5)
23:45:13 at ChildProcess.emit (events.js:182:13)
23:45:13 at maybeClose (internal/child_process.js:957:16)
23:45:13 at Process.ChildProcess._handle.onexit (internal/child_process.js:246:5)
23:45:13 ...
Again today: https://ci.nodejs.org/job/node-test-commit-linuxone/1000/nodes=rhel72-s390x/testReport/
assert.js:77
throw new AssertionError(obj);
^
AssertionError [ERR_ASSERTION]: Input A expected to strictly equal input B:
+ expected - actual
- 'SIGSEGV'
+ 'SIGTERM'
at cp.exec.common.mustCall (/data/iojs/build/workspace/node-test-commit-linuxone/nodes/rhel72-s390x/test/parallel/test-child-process-exec-timeout.js:34:12)
at /data/iojs/build/workspace/node-test-commit-linuxone/nodes/rhel72-s390x/test/common/index.js:474:15
at ChildProcess.exithandler (child_process.js:289:5)
at ChildProcess.emit (events.js:182:13)
at maybeClose (internal/child_process.js:957:16)
at Process.ChildProcess._handle.onexit (internal/child_process.js:246:5)
@nodejs/testing @nodejs/build @nodejs/child_process
reviewing the type of failures and the test case, my opinion is that the test case logic is vulnerable for race conditions and arbitrary behaviors that are platform / load dependant.
1.
cp.exec(cmd, {timeout: 1}...
assert.strictEqual(err.signal, ...
The kill signal may be delivered:
execed into the new process (post fork phase)exit routineWhich of this is going to happen will depend on:
cp.exec(cmd, {timeout: 2 ** 30} ...
assert.strictEqual(stdout.trim(), 'child stdout');
As the child process exits immediately after writing to the console, and as the console writing is non-blocking, there is no guarentee that the data will be flushed to the stream, whatever tiny the data element is, even if the parent waits for a long time. ref: https://github.com/nodejs/node/issues/19218
In summary, I believe there is nothing LinuxONE specific, and the reason rooted in the test case itself.
for 1 I propose the child to have a timer that is at least more than 1000 and the parent to maintain a timeout that is between 1000 and the child's timeout.
for 2 I propose to end the standard streams in the child before exiting in expectation of draining the buffered data.
Another one today:
https://ci.nodejs.org/job/node-test-commit-linux/18540/nodes=fedora-last-latest-x64/consoleText
ot ok 193 parallel/test-child-process-exec-timeout
---
duration_ms: 0.336
severity: fail
exitcode: 1
stack: |-
assert.js:77
throw new AssertionError(obj);
^
AssertionError [ERR_ASSERTION]: Input A expected to strictly equal input B:
+ expected - actual
- 'SIGSEGV'
+ 'SIGTERM'
at cp.exec.common.mustCall (/home/iojs/build/workspace/node-test-commit-linux/nodes/fedora-last-latest-x64/test/parallel/test-child-process-exec-timeout.js:34:12)
at /home/iojs/build/workspace/node-test-commit-linux/nodes/fedora-last-latest-x64/test/common/index.js:474:15
at ChildProcess.exithandler (child_process.js:289:5)
at ChildProcess.emit (events.js:182:13)
at maybeClose (internal/child_process.js:957:16)
at Process.ChildProcess._handle.onexit (internal/child_process.js:246:5)
...
Will see if I can implement fixes for the issues identified by @gireeshpunathil unless he or someone else already has something in the works or even submitted.
Is the fact that we can get SIGSEGV instead of SIGTERM from this code a bug or a fact of life? (If a bug, is it in this code or in Node.js/libuv?)
'use strict';
if (process.argv[2] === 'child') {
setTimeout(() => {}, 1000);
return;
}
const cmd = `"${process.execPath}" "${__filename}" child`;
const assert = require('assert');
const cp = require('child_process');
cp.exec(cmd, { timeout: 1 }, (err, stdout, stderr) => {
assert.strictEqual(err.signal, 'SIGTERM');
});
SIGSEGV in the test.parallel.status with the relevant issue in the libuv tracker (opening it if it doesn't already exist).I propose the child to have a timer that is at least more than 1000 and the parent to maintain a timeout that is between 1000 and the child's timeout.
That may mitigate things, but I don't think it will fix it entirely because it's still dependent on timing in two independent processes. If we're going to go that route, we should definitely move this test to sequential so that the processes aren't competing with other tests for resources.
Update: I was able to trivially cause the problem using the 15 lines of code in the previous comment on test-rackspace-fedora26-x64-1:
root@test-rackspace-fedora26-x64-1 ~]# /home/iojs/build/workspace/node-test-commit-linux/nodes/fedora-last-latest-x64/node /var/tmp/test.js & /home/iojs/build/workspace/node-test-commit-linux/nodes/fedora-last-latest-x64/node /var/tmp/test.js &
[1] 5212
[2] 5213
[root@test-rackspace-fedora26-x64-1 ~]# assert.js:77
throw new AssertionError(obj);
^
AssertionError [ERR_ASSERTION]: Input A expected to strictly equal input B:
+ expected - actual
- 'SIGSEGV'
+ 'SIGTERM'
at cp.exec (/var/tmp/test.js:14:9)
at ChildProcess.exithandler (child_process.js:289:5)
at ChildProcess.emit (events.js:182:13)
at maybeClose (internal/child_process.js:957:16)
at Process.ChildProcess._handle.onexit (internal/child_process.js:246:5)
Bonus: The process hung there too. I had to end it with cntl-C.
(The above reproduction doesn't happen every time. Like, not even close to every time. Although I certainly suspect that running 4 or 8 or 16 simultaneous instances would increase the reproducibility.)
Update: Reproduces pretty reliably at 64 simultaneous instances.
@nodejs/libuv
link this with https://github.com/nodejs/node/issues/11052 as that has new findings.
As #11052 has revealed an unsafe singal stack issue, it explains one type of failure in this test: expected SIGTERM but got SIGSEGV so I would park my theories on the test case logic. The other type: expected child stdout but got ' ' is unrelated to this issue, and may require a rework.
Most helpful comment
As #11052 has revealed an unsafe singal stack issue, it explains one type of failure in this test: expected
SIGTERMbut gotSIGSEGVso I would park my theories on the test case logic. The other type: expectedchild stdoutbut got' 'is unrelated to this issue, and may require a rework.