Node: TCP socket emitted 'end' event after 'error' event

Created on 6 Apr 2016 · 21Comments · Source: nodejs/node

Version: observed on v0.12, v4, and v5. Did not observe on v0.10. (details below)
Platform: observed on both illumos and OS X Mountain Lion, both 32-bit and 64-bit (details below)
Subsystem: net

Although it's not technically documented, a lot of code assumes that once a stream emits an 'error' event, it will not subsequently emit 'data' or 'end' events. This is pretty much necessary, because otherwise it's impossible for a stream consumer to know when a stream has come to rest and will emit no more events. I've reproduced a case where Node reliably emits an 'end' event after an 'error' event, which violates the expectations of code that assumes a stream has come to rest after 'error'.

To be really precise about where I tested it:

Works as expected on v0.10 (perhaps only because the 'end' event is emitted first):

illumos: v0.10.43 (both 32-bit and 64-bit)
OS X Mountain Lion: v0.10.28 (64-bit)

Does not work as expected on v0.12 and later:

illumos: v0.12.12, v4.4.0, v5.9.0 (both 32-bit and 64-bit)
OS X Mountain Lion: v0.12.2, v4.4.2, v5.10.0 (64-bit)

Here's a test case that's commented with what it's doing. I've tried to simplify it as much as possible, but since it's a race condition, it's tricky to get the timing just right.

/*
 * test-stream.js: demonstrates a case where a Node stream can see an 'end'
 * event after an 'error' event.
 *
 * This test case works as follows:
 *
 *      (1) Set up a TCP server socket and connect to it using a TCP client.
 *          Server: set up listeners for 'end' and 'error'.
 *          Client: set up listener for 'error'.
 *
 *      (2) Client: write 65536 bytes of data.
 *
 *      (3) Pause one second.  Behind the scenes, Node will detect that the
 *          server's socket has become readable and read all 65536 bytes.  These
 *          will be buffered in JavaScript.
 *
 *      (4) Server: read 65535 bytes of data from the Socket.  There will be
 *          one byte left buffered on the Socket in JavaScript.
 *          Client: destroy the socket.  This will send a FIN to the server.
 *
 *      (5) Asynchronously (via setImmediate):
 *          Server: read 1 byte of data from the Socket.  This will trigger Node
 *          to read from the underlying socket again, where it will read 0
 *          bytes, signifying the end of the stream.
 *
 *          Server: write data to the socket.  Since the socket is now
 *          disconnected, eventually these writes will report EPIPE/SIGPIPE.
 *          This generally happens synchronously with respect to the write()
 *          call, but the error will be emitted asynchronously.
 *
 *      (6) Asynchronously (via setImmediate):
 *          Server: read another byte from the socket.  At this point, we're
 *          reading past end-of-stream, and Node will schedule an 'end' event to
 *          be emitted, but an 'error' event has already been scheduled as well,
 *          so we'll see 'error' and then 'end', which should be invalid.
 */

var mod_net = require('net');
var mod_os = require('os');

/* IP address and port used for this test case. */
var ip = '127.0.0.1';
var port = 16404;

/* We'll use this buffer as a chunk of data. */
var bufsz = 64 * 1024;
var buf;

/* State for this test */
var server;     /* server's listening socket */
var ssock;      /* server's connection socket */
var csock;      /* client socket */
var end = false;    /* server has seen "end" on its connection socket */
var error = false;  /* server has seen "error" on its connection socket */

function main()
{
    console.log('versions:',
        process.version, process.arch, mod_os.platform());

    buf = new Buffer(bufsz);
    buf.fill(0);

    /*
     * (1) Set up client and server.
     */
    server = mod_net.createServer({ 'allowHalfOpen': true });
    server.on('connection', function (s) {
        console.log('server: client connected');
        ssock = s;

        ssock.on('end', function () {
            console.log('server: saw "end" on client socket');
            if (error) {
                console.log('reproduced issue!');
                process.abort();
            }

            end = true;
        });

        ssock.on('error', function (err) {
            console.log('server: saw "error" on client socket', err);
            if (error || end) {
                console.log('bailing out after server error');
                process.exit(0);
            }

            // ssock.read(1);
            error = true;
        });

        /*
         * (2) Client writes data.
         */
        csock.write(buf);

        /*
         * (3) Pause until the server sees that data.
         */
        ssock.once('readable', triggerIssue);
    });

    server.listen(port, function () {
        console.log('server: listening');

        csock = mod_net.createConnection(port, ip);
        csock.on('connect', function () {
            console.log('client: connected');
        });

        csock.on('end', function () {
            console.log('client: saw "end" on server socket');
        });
    });
}

function triggerIssue()
{
    console.log('triggering issue by destroying client socket');

    /*
     * (4) Read _most_ of the data from the socket and have the client
     * destroy the socket.
     */
    ssock.read(bufsz - 1);
    csock.destroy();
    setImmediate(function () {
        /*
         * (5) Read 1 byte of data from the socket and write data to it.
         */
        ssock.read(1);
        ssock.write(buf);
        ssock.write(buf);
        setImmediate(function () {
            /*
             * (6) Read one more byte.
             */
            ssock.read(1);
        });
    });
}

main();

The detailed output for each test I ran is here:
https://gist.github.com/davepacheco/84d450d2c25f6212a99a984a8f089b4c

confirmed-bug stream

Source

davepacheco

Most helpful comment

Fix in https://github.com/nodejs/node/pull/20104.

mcollina on 17 Apr 2018

❤1 🎉1

All 21 comments

@nodejs/streams Is this a bug? Documentation issue? Neither? Both?

Trott on 7 Jul 2017

I would need to look into it.

mcollina on 8 Jul 2017

Isn’t this unspecified and depends on the individual stream?

michaelnisi on 8 Jul 2017

As @michaelnisi noted, this is unspecified and depends on the individual stream. To the best of my knowledge, there is no mechanism/enforcement in net to prevent this. For Writable and Duplex streams (including sockets), there is the guarantee that 'error' will only be emitted once.

Also, 'error' will be emitted before 'end', 'close' and 'finish'. The correct approach is to remove all the listeners when 'error' happens.

That being said, I cannot reproduce the issue on node v4.4.2 (the one which output was linked in the issue), 4.8, 6.11, 8, on Mac OS X 10.12. @davepacheco is this still an issue? If that's the case, can you update the script to reproduce it, or maybe send a PR with a failing test?

The script currently hang with:

versions: v4.0.0 x64 darwin
server: listening
server: client connected
client: connected
triggering issue by destroying client socket
server: saw "end" on client socket

mcollina on 10 Jul 2017

Is there documentation that stream consumers should remove their 'close', 'end', and 'finish' listeners when 'error' happens? It seems like that makes using streams robustly a lot more complex.

I'm not seeing this on OS X El Capitan (v10.11.6), but I think that just means something about the timing has changed, not that the underlying issue was OS-specific or went away.

davepacheco on 10 Jul 2017

@davepacheco No, I do not think there is. However, most streams respect this generic pattern. Given the code we have in 8, I do not see how that can happen, as we destroy the handle synchronously in case of an internal error. I would be happy to take a look if you can reproduce anyhow.

The implementations in core should not have the problem you are describing, so it is a bug in that case. It might even be OSS specific, and maybe in libuv.

mcollina on 10 Jul 2017

Hello
We know anyone can connect to this server
While we need connect special user
How do we can prevent unwanted connection with known user?

iranappo on 26 Sep 2017

There has been no movement on this in nearly a year and no reports of it happening again. I'm going to close it out but if any new info has surfaced or you believe this should remain open, please feel free to do so.

apapirovski on 12 Apr 2018

Has anybody re-run the test program to see if it's still happening?

davepacheco on 16 Apr 2018

@davepacheco The situation has changed. The script hangs:

$ node s
versions: v4.8.0 x64 darwin
server: listening
server: client connected
client: connected
triggering issue by destroying client socket
server: saw "end" on client socket

Or I'm getting a nice abort:

$ node s
versions: v8.11.1 x64 darwin
server: listening
server: client connected
client: connected
triggering issue by destroying client socket
server: saw "error" on client socket { Error: write EPIPE
    at _errnoException (util.js:1022:11)
    at WriteWrap.afterWrite [as oncomplete] (net.js:880:14) code: 'EPIPE', errno: 'EPIPE', syscall: 'write' }
server: saw "end" on client socket
reproduced issue!
 1: node::Abort() [/Users/matteo/.nvm/versions/node/v8.11.1/bin/node]
 2: node::Chdir(v8::FunctionCallbackInfo<v8::Value> const&) [/Users/matteo/.nvm/versions/node/v8.11.1/bin/node]
 3: v8::internal::FunctionCallbackArguments::Call(void (*)(v8::FunctionCallbackInfo<v8::Value> const&)) [/Users/matteo/.nvm/versions/node/v8.11.1/bin/node]
 4: v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<false>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::FunctionTemplateInfo>, v8::internal::Handle<v8::internal::Object>, v8::internal::BuiltinArguments) [/Users/matteo/.nvm/versions/node/v8.11.1/bin/node]
 5: v8::internal::Builtin_Impl_HandleApiCall(v8::internal::BuiltinArguments, v8::internal::Isolate*) [/Users/matteo/.nvm/versions/node/v8.11.1/bin/node]
 6: 0x7a6215042fd
Abort trap: 6

This is also confirmed on master.

mcollina on 17 Apr 2018

Here is some output from llnode:

* thread #1, stop reason = signal SIGSTOP
  * frame #0: 0x00007fff73a70b6e libsystem_kernel.dylib`__pthread_kill + 10
    frame #1: 0x00007fff73c3b080 libsystem_pthread.dylib`pthread_kill + 333
    frame #2: 0x00007fff739cc1ae libsystem_c.dylib`abort + 127
    frame #3: 0x0000000100028b77 node`node::Abort() at node.cc:1550 [opt]
    frame #4: 0x000000010002c7e2 node`node::Abort(args=<unavailable>) at node.cc:1572 [opt]
    frame #5: 0x00000001001a3033 node`v8::internal::FunctionCallbackArguments::Call(this=0x00007ffeefbfe210, handler=<unavailable>) at api-arguments.cc:29 [opt]
    frame #6: 0x00000001001fdc0f node`v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<false>(isolate=0x0000000102801e00, function=<unavailable>, new_target=<unavailable>, fun_data=<unavailable>, receiver=<unavailable>, args=BuiltinArguments @ 0x00007ffeefbfe2a0) at builtins-api.cc:107 [opt]
    frame #7: 0x00000001001fd2e8 node`v8::internal::Builtin_Impl_HandleApiCall(args=<unavailable>, isolate=0x0000000102801e00) at builtins-api.cc:137 [opt]
    frame #8: 0x00003a0a4fc0427d
    frame #9: 0x00003a0a4fc144f7
    frame #10: 0x00003a0a4fc144f7
    frame #11: 0x00003a0a4fc144f7
    frame #12: 0x00003a0a4fc144f7
    frame #13: 0x00003a0a4fc118d5
    frame #14: 0x00003a0a4fc096a1
    frame #15: 0x00000001004faa86 node`v8::internal::(anonymous namespace)::Invoke(v8::internal::Isolate*, bool, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, int, v8::internal::Handle<v8::internal::Object>*, v8::internal::Handle<v8::internal::Object>, v8::internal::Execution::MessageHandling, v8::internal::Execution::Target) [inlined] v8::internal::GeneratedCode<v8::internal::Object*, v8::internal::Object*, v8::internal::Object*, v8::internal::Object*, int, v8::internal::Object***>::Call(args=<unavailable>, args=<unavailable>, args=<unavailable>, args=0) at simulator.h:110 [opt]
    frame #16: 0x00000001004faa74 node`v8::internal::(anonymous namespace)::Invoke(isolate=0x0000000102801ec8, is_construct=<unavailable>, target=<unavailable>, receiver=<unavailable>, argc=0, args=0x0000000000000000, new_target=<unavailable>, message_handling=kReport, execution_target=kCallable) at execution.cc:153 [opt]
    frame #17: 0x00000001004fa768 node`v8::internal::Execution::Call(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, int, v8::internal::Handle<v8::internal::Object>*) [inlined] v8::internal::(anonymous namespace)::CallInternal(isolate=0x0000000102801e00, callable=Handle<v8::internal::Object> @ r12, argc=0, argv=0x0000000000000000, message_handling=kReport, target=kCallable) at execution.cc:189 [opt]
    frame #18: 0x00000001004fa6ed node`v8::internal::Execution::Call(isolate=0x0000000102801e00, callable=Handle<v8::internal::Object> @ r12, receiver=<unavailable>, argc=0, argv=0x0000000000000000) at execution.cc:200 [opt]
    frame #19: 0x0000000100187c16 node`v8::Function::Call(this=0x00000001028564a0, context=<unavailable>, recv=<unavailable>, argc=0, argv=0x0000000000000000) at api.cc:5089 [opt]
    frame #20: 0x00000001000278f1 node`node::InternalCallbackScope::Close(this=0x00007ffeefbfe850) at node.cc:989 [opt]
    frame #21: 0x0000000100027a84 node`node::InternalMakeCallback(env=0x00007ffeefbfea38, recv=<unavailable>, callback=<unavailable>, argc=<unavailable>, argv=0x0000000000000000, asyncContext=<unavailable>) at node.cc:1025 [opt]
    frame #22: 0x0000000100027c12 node`node::MakeCallback(isolate=<unavailable>, recv=<unavailable>, callback=<unavailable>, argc=<unavailable>, argv=<unavailable>, asyncContext=<unavailable>) at node.cc:1079 [opt]
    frame #23: 0x000000010001a8dd node`node::Environment::CheckImmediate(handle=<unavailable>) at env.cc:419 [opt]
    frame #24: 0x000000010094de0a node`uv__run_check(loop=0x0000000101646750) at loop-watcher.c:67 [opt]
    frame #25: 0x00000001009494c1 node`uv_run(loop=0x0000000101646750, mode=UV_RUN_DEFAULT) at core.c:369 [opt]
    frame #26: 0x0000000100032061 node`node::Start(isolate=0x0000000102801e00, isolate_data=<unavailable>, argc=2, argv=<unavailable>, exec_argc=<unavailable>, exec_argv=<unavailable>) at node.cc:4504 [opt]
    frame #27: 0x0000000100031860 node`node::Start(event_loop=0x0000000101646750, argc=2, argv=0x00000001024067d0, exec_argc=1, exec_argv=0x00000001024068a0) at node.cc:4583 [opt]
    frame #28: 0x00000001000314cb node`node::Start(argc=<unavailable>, argv=0x00000001024067d0) at node.cc:4640 [opt]
    frame #29: 0x0000000100001034 node`start + 52

Do you know why it's missing the JS stacks? It's taken from master.

mcollina on 17 Apr 2018

New V8. llnode hasn't been updated yet.

Also, the Ignition interpreter complicates frame->function mapping. In the naive solution you only see the interpreter frames, like you do with perf(1) now.

@mmarchini has been working on that however, and with respectable stamina too, I may add.

bnoordhuis on 17 Apr 2018

Fix in https://github.com/nodejs/node/pull/20104.

mcollina on 17 Apr 2018

❤1 🎉1

@bnoordhuis I thought llnode was handling interpreted functions 🤔 (at least on Node.js 8). Unfortunately what I've been working on will only fix perf(1), but I'll take a look into llnode to see if it's handling interpreted frames correctly.

Looking at the output from @mcollina though, looks like bt was used instead of v8 bt, or the postmortem metadata is missing from the binary. At least some of those missing frames should appear as "\

mmarchini on 18 Apr 2018

My bad, I was using the wrong command. Anyway, I tracked down the problem and a PR is on the way.

mcollina on 19 Apr 2018

Sorry to bomb this thread but I am looking for info regarding whether a tcp connection can recover after an 'error' event is emitted. I am guessing not, but I am not really sure. Currently, if 'error' is emitted I call .end() on the current connection and try to open a entirely new one.

ORESoftware on 30 Mar 2019

Unless you are emitting error yourself, you can safely assume that the connection is gone. You don’t need to call .end().

If you want to be safe, call .destroy()

mcollina on 30 Mar 2019

@mcollina thanks appreciated

ORESoftware on 1 Apr 2019

@ronag do you know if this is still an issue or if this is already fixed?

BridgeAR on 3 Jan 2020

@BridgeAR: It still seems to be an issue. As far as I can tell, the fix was reverted and the follow up was closed... I haven't followed the entire history and rationale of this.

@mcollina mentioned the following https://github.com/nodejs/node/pull/20571#issuecomment-431650497