Node: socket stay in ESTABLISHED status (same happened for #1924 and #5026)

Created on 7 Jan 2016  路  20Comments  路  Source: nodejs/node

Hello, guys

I found myself in trouble with closing socket properly in nodejs.
I tried to close socket connections from server side(nodejs side) with socket.end() (with socket.destroy() function follows, just to make sure about everything I can do with Javascript). But I found out lots of sockets are not closed (using ss & netstat) and stay in ESTABLISHED status forever.
The problem is : I already called socket.end() and listen to the 'close' event (for logging and sure the socket is 'closed' by javascript code).Lots of the sockets supposed to be closed while still open when I check socket connections with netstat. The only difference I could think of is the networks are generally bad in reality so that if send FIN to client, there would be no response and sockets hang there forever. What I don't understand is after I called end() and destroy() in nodejs, the socket is expected be jumped into FIN_WAIT mode.
The solution I used is setKeepAlive(true). And then the ESTABLISHED sockets could be closed do to the check packages from the system level (By setting net.ipv4.tcp_keepalive_time).
I found thread https://github.com/nodejs/node-v0.x-archive/issues/3613 is about sockets stay in FIN_WAIT2 status. Not sure if it is connected with my problem.

Plantform: Ubuntu 14.04
Node : 4.2.x
CPU: 48 cores Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz
Memory: 128GB
Some system settings:

net.ipv4.tcp_sack = 1
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_rmem = 4096        87380   4194304
net.ipv4.tcp_wmem = 4096        16384   4194304
net.core.wmem_default = 8388608
net.core.rmem_default = 8388608
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.core.netdev_max_backlog = 262144
net.core.somaxconn = 2621440
net.ipv4.tcp_max_orphans = 3276800
net.ipv4.tcp_max_syn_backlog = 2621440
net.ipv4.tcp_timestamps = 0
net.ipv4.tcp_synack_retries = 1
net.ipv4.tcp_syn_retries = 1
net.ipv4.tcp_tw_recycle = 0
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_mem = 94500000 915000000 927000000
net.ipv4.tcp_fin_timeout = 1
net.ipv4.tcp_keepalive_time = 30
net.ipv4.ip_local_port_range = 1024    65000
vm.overcommit_memory=1
net.ipv4.tcp_max_tw_buckets = 6000
net.netfilter.nf_conntrack_max=13107200
vm.swappiness = 20
fs.file-max = 8000000
fs.nr_open = 8000000
net question

Most helpful comment

@lfernando-silva call the following on the socket as soon as it's opened:

    // Set socket idle timeout in milliseconds
    socket.setTimeout(1000 * 60 * 5); // 5 minutes

    // Wait for timeout event (node will emit it when idle timeout elapses)
    socket.on('timeout', function () {
        // Call destroy again
        socket.destroy(); 
    });

This will work around the ESTABLISHED zombie sockets bug.

All 20 comments

Can you also provide a minimal, standalone code example that reproduces the problem?

@mscdex code side of JS is just simple end() and destroy() function.
The scenario is : clients connected to server and started a heartbeat, I am trying to remove sockets which stopped heartbeat in certain amount time (such as 10 min). What I am doing is to disconnect from the client (devices) from server side( nodejs backend) using socket.end() function. But it comes out when I search over TCP sockets connected to my server, there are lots of connections never actually closed by the end() and destroy() function.
I am now looking over my code base for logic bugs about the whole closing process.
Not sure if nodejs shutting down socket in C++ side would work properly over large amount of traffic load. Like with 1million connections connected to my server, and 10,000 connections created and disconnected every second, could the end() or destroy() malfunction for some sockets?

Do you call socket.end() or socket.destroy()? The former waits until pending data has been flushed out before closing, the latter shuts it down immediately.

@bnoordhuis I used socket.end()at beginning, but now I used both. And is it possible that under heavy load, socket is not closed by the system even we called end and destroy function? And really want to know if I could debug on C++ level with nodejs, is there any tutorial about it?

Maybe try logging socket._handle.fd right before you call socket.destroy()? If you attach strace to the process, you should see matching close(fd) system calls.

@gogorush were you able to find a solution to your problem? I am in the same boat right now. Calling socket.destroy leaves many connections in ESTABLISHED state in netstat -an.

I was able to workaround this problem by always setting an idle timeout for all client sockets via socket.setTimeout and calling socket.destroy() when the timeout elapses:

// Initialize server
var server = new net.Server();

// Start listening
server.listen(3000);

// Wait for client socket
server.on('connection', function (socket) {
    // Set socket idle timeout in milliseconds
    socket.setTimeout(1000 * 60 * 5); // 5 minutes

    // Wait for timeout event (socket will emit it when idle timeout elapses)
    socket.on('timeout', function () {
        // Call destroy again
        socket.destroy(); 
    });
});

I find that some sockets are still stuck in the ESTABLISHED state with this workaround (about 15% of them), but at least they are not stuck in this state forever, as before without using socket.setTimeout(). I modified a few things to attempt to overcome this issue, and am 95% sure that using socket.setTimeout() finally did the trick.

Also, this appears to be a duplicate of #1924 and #5026, referencing them so they may consider this solution as well.

@eladnava Thx, and this was my solution as well. Seems it is working very fine til now considering the amount of connections to my server(millions). Sorry I did not write them down here earlier, but end(), destroy() with timeout function worked. Thx again for your work and I am gonna close this issue now.

I also got the same problem in Node 6.5.0 LTS at linux server. In my case, I have a list of sockets (Object), mapped by some key. So, when I'm destroying it (just socket.destroy() ), I've done

socketList['index'].destroy()
delete socketList['index']

I also tried

let socket = socketList['index']
socket.destroy()
delete socketList['index']

I realized that number of ESTABLISHED connections (using netstat -ant ) are greater than my socketList count, as @gogorush and @eladnava said. Should I find a way to kill tcp connection? @gogorush, how does this socket.destroy() with timeout work?

@lfernando-silva call the following on the socket as soon as it's opened:

    // Set socket idle timeout in milliseconds
    socket.setTimeout(1000 * 60 * 5); // 5 minutes

    // Wait for timeout event (node will emit it when idle timeout elapses)
    socket.on('timeout', function () {
        // Call destroy again
        socket.destroy(); 
    });

This will work around the ESTABLISHED zombie sockets bug.

Thank you @eladnava , I'll do this. How do you reproduce this bug at development? How?

@lfernando-silva I'm pretty sure this only occurs with sockets that are passively disconnected, where the remote party does not acknowledge the FIN packet (i.e. mobile phones that lose Internet connectivity very often). This is a little harder to emulate in development, it is an issue that only occurred in production for me.

@lfernando-silva If you are using a mobile with client to connect to your server you can turn off the internet such as switch off mobile data, turn off wifi, switch off router, etc.

@eladnava , really, for me just in production too. The tcp socket state itself is a problem, from connection to realize the disconnection event, sometimes is really hard to know the connection state. I already has this kind of problem using iot tcp-based protocols.
@gogorush no, in my case is a low level embedded system, actually a lot of them. In dev, with two or three, never realized this problem, but in production that is almost 50 devices and then I found it. This number will grow soon.

Thank you guys for the tips, I will confirm this solution after the deploy!

@lfernando-silva you can be sure that a connection is no longer active if the client has not sent a keep-alive within an allotted time period.

This is where socket.setTimeout comes in handy. Node.js will emit the 'timeout' event on the socket when the connection has timed out (no activity in X ms). All you need to do is configure the inactivity duration. If your protocol does not include keep-alive, consider using another protocol.

@eladnava I set the server timeout to 60 and keep it alive for 20ms. It also has a list that maps the sockets that are sending some business rule message with their states. At some point, when I need to disconnect some device, I remove its socket from the list and call socket.destroy () to this socket.

I was hoping the socket would actually be closed and the device could reconnect at some other time, but I realized through the IP: PORT that the sockets removed was ESTABLISHED in netstat -ant.

Now I'm also setting timeout for 10ms to issue the FIN package twice.

This issue has resurfaced for me (on Node v8), even with my attempted workaround.

This issue may be related: https://github.com/nodejs/node/issues/5757

For some reason, I still have many zombie TCP sockets stuck in ESTABLISHED state.

I've deployed another workaround which includes using a standard JS setTimeout() to disconnect idle connections without using socket.setTimeout():

// Initialize server
var server = new net.Server();

// Start listening
server.listen(3000);

// Wait for client socket
server.on('connection', function (socket) {
    // Set socket idle timeout in milliseconds
    socket.idleTimeout = setTimeout(socket.destroy, 1000 * 60 * 5); // Destroy socket in 5 minutes
});

// On new data received
socket.on('data', function () {
    // Clear previous idle timeout
    clearTimeout(socket.idleTimeout);

    // Start a new idle timeout
    socket.idleTimeout = setTimeout(socket.destroy, 1000 * 60 * 5); // Destroy socket in 5 minutes
});

Testing it out now and will update on success.

@eladnava FWIW starting in node v10.2.0+ you can reuse existing timer objects by calling .refresh() on it to reactivate/restart it.

Cheers. It appears socket.destroy() is being called on socket timeout but the underlying system socket still stays in ESTABLISHED state.

Seems that all sockets stuck in ESTABLISHED state are tls.Server sockets. When I use a standard net.Server the issue isn't reproduced. Investigating and will update with findings.

From preliminary results, it appears that my zombie TLS sockets are erroneous handshake connections or timed out handshakes. It's not clear from Node documentation what tls.Server() does to these, but for some reason these connections are staying in ESTABLISHED state.

Using the tlsClientError event, I can detect these connections and forcibly .destory() them. Maybe the default behavior is to keep them open? Not sure but destroying them seems to have solved the issue for me.

var tlsServer = tls.createServer(ssl.getSSLConfiguration());

tlsServer.on('tlsClientError', function onTlsClientError(error, socket) {
    // Forcibly destroy socket
    socket.destroy();
});

So far, so good! 馃帀

Was this page helpful?
0 / 5 - 0 ratings

Related issues

benjamingr picture benjamingr  路  135Comments

jonathanong picture jonathanong  路  91Comments

feross picture feross  路  208Comments

aduh95 picture aduh95  路  104Comments

thecodingdude picture thecodingdude  路  158Comments