Etcd: TCP sockets not closing properly when etcd is running proxy mode.

Created on 14 Dec 2017 · 7Comments · Source: etcd-io/etcd

Setup a small cluster of 4 nodes, having 3 nodes + 1 proxy (config has ETCD_PROXY="on" which is not the default).

On the proxy node, check for open TCP sockets like this:

netstat -n -p -a | fgrep etcd

When local clients on the proxy node connect to 127.0.0.1:2379 we can see a new connection from an ephemeral TCP port on the proxy node over to port 2379 on one of the other three working nodes, and that's fine because it is the proxy behaviour in operation. However these proxy connections do not properly clean up when etcd is long lived. Checking with netstat as per above shows more and more lines of output as more activity goes via the proxy. Over time, the available file-handles are consumed and eventually it will refuse connections.

etcd: http: Accept error: accept tcp [::]:2379: accept4: too many open files; retrying in 5ms

This appears related to an old issue which possibly has come back, or maybe never got fixed 100% in the first place which is here ... https://github.com/coreos/etcd/issues/1959

Restarting etcd temporarily gets it working again, only to have the file handles gradually get consumed over time, requiring more restarts. It is not necessary to restart the entire cluster, merely restarting the proxy node is sufficient, so this guy is certainly the culprit.

Platform is CentOS-7 using the packaged etcd installed by "yum" and launched via "systemd" as follows:

Name        : etcd
Arch        : x86_64
Version     : 3.2.7
Release     : 1.el7
Size        : 39 M
Repo        : installed
From repo   : extras
Summary     : A highly-available key value store for shared configuration
URL         : https://github.com/coreos/etcd
License     : ASL 2.0
Description : A highly-available key value store for shared configuration.

In the maps I see the following libraries are being used by the etcd process:

/usr/lib64/libc-2.17.so
/usr/lib64/libdl-2.17.so
/usr/lib64/libpthread-2.17.so
/usr/lib64/ld-2.17.so

These are all very standard CentOS system libraries, I doubt the bug is in the library, but at least you should be able to reproduce the same setup fairly easily. Many people in comments on the older issues reported similar setup (3 nodes + 1 proxy) was the way to reproduce this problem, so it would appear to be quite consistently happening. We are running a bunch of web servers, and a load balancer, sharing session data via etcd which should be fairly simple read/write type operations that can easily be simulated for testing. I'm guessing that the content of the data is irrelevant; typical size of data block might be approx 1k bytes.

arequestion

Source

lnx-bsp

All 7 comments

Checking with netstat as per above shows more and more lines of output as more activity goes via the proxy. Over time, the available file-handles are consumed and eventually it will refuse connections.

What is your OS file descriptor limit?

gyuho on 14 Dec 2017

The OS limit is huge. But that's not the point, because I can get a couple of days of operation out of the etcd proxy before needing a restart, and during that time the number of open sockets grows. Eventually it hits the prlimit on that etcd process (around 64k).

# cat /proc/sys/fs/file-max
379890

# prlimit --pid 12043
RESOURCE   DESCRIPTION                             SOFT      HARD UNITS
AS         address space limit                unlimited unlimited bytes
CORE       max core file size                         0 unlimited blocks
CPU        CPU time                           unlimited unlimited seconds
DATA       max data size                      unlimited unlimited bytes
FSIZE      max file size                      unlimited unlimited blocks
LOCKS      max number of file locks held      unlimited unlimited 
MEMLOCK    max locked-in-memory address space     65536     65536 bytes
MSGQUEUE   max bytes in POSIX mqueues            819200    819200 bytes
NICE       max nice prio allowed to raise             0         0 
NOFILE     max number of open files               65536     65536 
NPROC      max number of processes                14990     14990 
RSS        max resident set size              unlimited unlimited pages
RTPRIO     max real-time priority                     0         0 
RTTIME     timeout for real-time tasks        unlimited unlimited microsecs
SIGPENDING max number of pending signals          14990     14990 
STACK      max stack size                       8388608 unlimited bytes

But realistically less than 500 sockets on the proxy process should be plenty for what I'm doing, if it was cleaning up correctly. You just have to find the place where it should be closing those sockets, and close them out when no longer used. That's the only answer. Expanding limits can never fix the problem because the number of sockets just continues to grow with time... this is delaying the problem at best.

lnx-bsp on 14 Dec 2017

We might have kept connections alive for reuse.

https://github.com/coreos/etcd/pull/2900 configures MaxIdleConnsPerHost as 128.

https://github.com/coreos/etcd/blob/014c3750999d0658203761ad35793847001b76ca/etcdmain/etcd.go#L211-L215

We could call *http.Transport.CloseIdleConnections everytime proxy happens, but connections won't be reused. Or set *http.Transport.IdleConnTimeout.

@lnx-bsp Do you want to help fix this? We are currently busy with v3 and grpc-proxy features. Won't have time to address this for awhile.

gyuho on 14 Dec 2017

I will check with the boss how much resource they want to put into this. I'm better with C programming, not so strong on golang. We might consider going to a configuration without any proxies if that's not to difficult to achieve.

Reuse of idle connections is fine, if it also would reduce the need for new sockets being opened. I will collect some tcpdump traffic to confirm, but my general feeling is that none of these are getting reused in practice. The count of sockets is much larger than 128, it goes right into the thousands, and they sit around for more than 24 hours so whatever kind of timeout you have in operation does not appear to be working properly. Let me come back after studying the network traffic.

lnx-bsp on 15 Dec 2017

👍1

I collected about 4 hours of packet capture on TCP port 2379 only looking at traffic between servers (ignoring local traffic). I see a lot of connections open from ephemeral ports on the proxy node over to port 2379 on the other three nodes, and each of these is sending TCP keep-alives back and forth (burst of 4 packets every 30 seconds, 2 packets in one direction, 2 in the other direction like this):

02:17:26.225144 IP 192.0.2.104.32938 > 192.0.2.101.2379: Flags [.], ack 1, win 236, options [nop,nop,TS val 1332026880 ecr 1357195941], length 0
02:17:26.226092 IP 192.0.2.101.2379 > 192.0.2.104.32938: Flags [.], ack 1, win 235, options [nop,nop,TS val 1357226021 ecr 1331996891], length 0
02:17:26.316507 IP 192.0.2.101.2379 > 192.0.2.104.32938: Flags [.], ack 1, win 235, options [nop,nop,TS val 1357226112 ecr 1331996891], length 0
02:17:26.316510 IP 192.0.2.104.32938 > 192.0.2.101.2379: Flags [.], ack 1, win 236, options [nop,nop,TS val 1332026971 ecr 1357226021], length 0

Those are "length 0" at the TCP layer (i.e. all the right headers are present but no stream data is moving). They serve only to keep those connections open. So from a firewall perspective they would not appear to be idle connections, however from an application perspective they are indeed idle.

I can see a great many of these type of connections (identified based on the ephemeral port number) and other than sending keep-alives no real data goes across for the hours that I collected data. Thus, I'm confident that the connections are getting neither closed nor reused, they are simply generating network traffic and consuming memory. If you run these through a firewall or VPN then what happens is the connection tracking table fills up and it might respond as if it was a "Denial of Service" when it sees a lot of connections sitting doing keep alive and nothing else.

lnx-bsp on 15 Dec 2017

👍1

https://github.com/coreos/etcd/pull/9336 probably fixes this problem