https://ci.nodejs.org/job/node-test-binary-windows/17414/COMPILED_BY=vs2017,RUNNER=win2016,RUN_SUBSET=3/console
not ok 51 parallel/test-child-process-fork-closed-channel-segfault
---
duration_ms: 0.259
severity: fail
exitcode: 1
stack: |-
c:\workspace\node-test-binary-windows\test\parallel\test-child-process-fork-closed-channel-segfault.js:70
throw err;
^
Error: write EMFILE
at ChildProcess.target._send (internal/child_process.js:741:20)
at ChildProcess.target.send (internal/child_process.js:625:19)
at Worker.send (internal/cluster/worker.js:40:28)
at Socket.<anonymous> (c:\workspace\node-test-binary-windows\test\parallel\test-child-process-fork-closed-channel-segfault.js:36:16)
at Object.onceWrapper (events.js:273:13)
at Socket.emit (events.js:182:13)
at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1147:10)
Likely related:
not ok 52 parallel/test-child-process-fork-net
---
duration_ms: 0.354
severity: fail
exitcode: 1
stack: |-
PARENT: server listening
CHILD: server listening
CLIENT: connected
PARENT: got connection
CLIENT: connected
CHILD: got connection
CLIENT: closed
CHILD: got connection
CHILD: got connection
CLIENT: closed
CLIENT: connected
CLIENT: connected
CLIENT: closed
CLIENT: closed
PARENT: server closed
CHILD: got socket
CLIENT: got data
CLIENT: closed
testSocket, listening
events.js:167
throw er; // Unhandled 'error' event
^
Error: write EPIPE
at ChildProcess.target._send (internal/child_process.js:741:20)
at ChildProcess.target.send (internal/child_process.js:625:19)
at SocketListSend._request (internal/socket_list.js:20:16)
at SocketListSend.close (internal/socket_list.js:40:10)
at Server.close (net.js:1624:24)
at Socket.<anonymous> (c:\workspace\node-test-binary-windows\test\parallel\test-child-process-fork-net.js:179:16)
at Socket.emit (events.js:182:13)
at TCP._handle.close [as _onclose] (net.js:596:12)
Emitted 'error' event at:
at process.nextTick (internal/child_process.js:745:39)
at process._tickCallback (internal/process/next_tick.js:61:11)
For the first one, EMFILE is predicted to occur according to this comment but not sure why it wasn't captured in the test.
@gireeshpunathil The test checks that err.message is 'Channel closed' and it's not for EMFILE. Not sure if that's a recent change to the message or what. The code that checks for that string was added in https://github.com/nodejs/node/pull/8954 by @santigimeno (when the test had a different name--it was subsequently renamed).
@nodejs/testing @nodejs/platform-windows @nodejs/child_process
(Interestingly, the string was 'channel closed' in the original PR but is now 'Channel closed' with different capitalization.)
And another one (again both tests failed - I wonder if the first test has influence on the second one).
https://ci.nodejs.org/job/node-test-binary-windows/17446/COMPILED_BY=vs2017,RUNNER=win2016,RUN_SUBSET=3/console
One issue with test-child-process-fork-net is that is uses both console.log() and console.error(). The test runner does not preserve the interleaving of messages from stdout and stderr, so the messages as shown in CI are not the order they appear in when you run the test from the command line. PR to fix that aspect of things: https://github.com/nodejs/node/pull/20873 A 馃憤 to fast-track that would be great.
Better output for test-child-process-fork-net (stdout and stderr properly interleaved):
not ok 52 parallel/test-child-process-fork-net
---
duration_ms: 0.324
severity: fail
exitcode: 1
stack: |-
PARENT: server listening
CHILD: server listening
CLIENT: connected
PARENT: got connection
CLIENT: connected
CHILD: got connection
CLIENT: closed
CHILD: got connection
CHILD: got connection
CLIENT: closed
CLIENT: connected
CLIENT: connected
CLIENT: closed
CLIENT: closed
PARENT: server closed
testSocket, listening
CHILD: got socket
CLIENT: got data
CLIENT: closed
events.js:167
throw er; // Unhandled 'error' event
^
Error: write EPIPE
at ChildProcess.target._send (internal/child_process.js:741:20)
at ChildProcess.target.send (internal/child_process.js:625:19)
at SocketListSend._request (internal/socket_list.js:20:16)
at SocketListSend.close (internal/socket_list.js:40:10)
at Server.close (net.js:1624:24)
at Socket.<anonymous> (c:\workspace\node-test-binary-windows\test\parallel\test-child-process-fork-net.js:178:16)
at Socket.emit (events.js:182:13)
at TCP._handle.close [as _onclose] (net.js:596:12)
Emitted 'error' event at:
at process.nextTick (internal/child_process.js:745:39)
at process._tickCallback (internal/process/next_tick.js:61:11)
...
I tried to run Windows stress tests for this to see how often it fails but after compiling for a long time, the builds hit a three-minute timeout and don't run the tests. See https://ci.nodejs.org/job/node-stress-single-test/1858/nodes=win2016-1p-vs2017/console and https://ci.nodejs.org/job/node-stress-single-test/1857/nodes=win2016-1p-vs2017/console.
19:31:05 Creating library c:\workspace\node-stress-single-test\nodes\win2016-1p-vs2017\Release\mksnapshot.lib and object c:\workspace\node-stress-single-test\nodes\win2016-1p-vs2017\Release\mksnapshot.exp
19:31:05 Generating code
19:34:05 Build timed out (after 3 minutes). Marking the build as aborted.
19:34:05 Build was aborted
19:34:05 Notifying upstream projects of job completion
19:34:05 Finished: ABORTED
@nodejs/build
On my Win 10 box test-child-process-fork-closed-channel-segfault fails about 6% times about parallel/test-child-process-fork-net fails about once every 30k runs.
@bzoz any chance you could have a look at test-http2-pipe and see if it still fails under load? (unrelated to this issue, of course) If so, could you build with --debug-http2 --debug-nghttp2 and post the debugging info from a failed run to this issue: https://github.com/nodejs/node/issues/20750
It would really be appreciated!
@apapirovski sure, I'll take a look
Just my 2c but this test has been flaky since its introduction, only very infrequently replicates the condition on the originally bugged version (Node 4.0.0) and is unlikely to occur exactly like this again. As such, I think we should scrap the test and move the one important check (assert.strictEqual(worker.process.channel, null);) into another test that uses child_process in a similar manner.
If we try to fix it yet again, it'll be a 6th PR to make this test less flaky. There are currently 5 separate error conditions that we ignore in this. We might be adding a sixth. IMO it's more trouble than it's worth.
Failures showing up on windows for 10.2.0 release
https://ci.nodejs.org/job/node-test-binary-windows/17534/COMPILED_BY=vs2017,RUNNER=win2016,RUN_SUBSET=3/console
Failure again on 10.2.1
https://ci.nodejs.org/job/node-test-binary-windows/17566/COMPILED_BY=vs2017,RUNNER=win2016,RUN_SUBSET=3/console
Failure on node-daily-master task:
https://ci.nodejs.org/job/node-test-binary-windows/17579/COMPILED_BY=vs2017,RUNNER=win2016,RUN_SUBSET=1/console
Host: test-azure_msft-win2016-x64-5
not ok 51 parallel/test-child-process-fork-closed-channel-segfault
---
duration_ms: 0.299
severity: fail
exitcode: 1
stack: |-
c:\workspace\node-test-binary-windows\test\parallel\test-child-process-fork-closed-channel-segfault.js:70
throw err;
^
Error: write EMFILE
at ChildProcess.target._send (internal/child_process.js:741:20)
at ChildProcess.target.send (internal/child_process.js:625:19)
at Worker.send (internal/cluster/worker.js:40:28)
at Socket.<anonymous> (c:\workspace\node-test-binary-windows\test\parallel\test-child-process-fork-closed-channel-segfault.js:36:16)
at Object.onceWrapper (events.js:273:13)
at Socket.emit (events.js:182:13)
at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1147:10)
...
ok - I guess know the reason for failure (though cannot test now, about to leave for the day.)
https://github.com/nodejs/node/issues/3635#issuecomment-157714683 (refefenced in the test case) was applicable to this line - connection establishment to a server whose state is arbitrary, is prone to 3 types of errors, and hence we checked and skipped them in the client socket's error handler.
This was almost 3 years back!
Now it turns out to be that this line too is prone to the same problem, under similar circumstances - and that makes sense, as the server can close any time between client connecting and worker sending messages.
In summary, we need an error callback for worker object too, that has the same code as in the client socket callback.
Is parallel/test-child-process-fork-net actually related or should we open another issue to fix it?
@mmarchini - looking the failing stack, I believe it is the same issue. So you can apply the same patch as in #20973 over there too, if you are going for a PR. However, I have found issues with it, and please wait while we fix this.
@gireeshpunathil do you mind if I open a PR to fix parallel/test-child-process-fork-net as well?
Most helpful comment
ok - I guess know the reason for failure (though cannot test now, about to leave for the day.)
https://github.com/nodejs/node/issues/3635#issuecomment-157714683 (refefenced in the test case) was applicable to this line - connection establishment to a server whose state is arbitrary, is prone to 3 types of errors, and hence we checked and skipped them in the client socket's error handler.
This was almost 3 years back!
Now it turns out to be that this line too is prone to the same problem, under similar circumstances - and that makes sense, as the server can close any time between client connecting and worker sending messages.
In summary, we need an error callback for worker object too, that has the same code as in the client socket callback.