Setup a small cluster of 4 nodes, having 3 nodes + 1 proxy (config has ETCD_PROXY="on" which is not the default).
On the proxy node, check for open TCP sockets like this:
netstat -n -p -a | fgrep etcd
When local clients on the proxy node connect to 127.0.0.1:2379 we can see a new connection from an ephemeral TCP port on the proxy node over to port 2379 on one of the other three working nodes, and that's fine because it is the proxy behaviour in operation. However these proxy connections do not properly clean up when etcd is long lived. Checking with netstat as per above shows more and more lines of output as more activity goes via the proxy. Over time, the available file-handles are consumed and eventually it will refuse connections.
etcd: http: Accept error: accept tcp [::]:2379: accept4: too many open files; retrying in 5ms
This appears related to an old issue which possibly has come back, or maybe never got fixed 100% in the first place which is here ... https://github.com/coreos/etcd/issues/1959
Restarting etcd temporarily gets it working again, only to have the file handles gradually get consumed over time, requiring more restarts. It is not necessary to restart the entire cluster, merely restarting the proxy node is sufficient, so this guy is certainly the culprit.
Platform is CentOS-7 using the packaged etcd installed by "yum" and launched via "systemd" as follows:
Name : etcd
Arch : x86_64
Version : 3.2.7
Release : 1.el7
Size : 39 M
Repo : installed
From repo : extras
Summary : A highly-available key value store for shared configuration
URL : https://github.com/coreos/etcd
License : ASL 2.0
Description : A highly-available key value store for shared configuration.
In the maps I see the following libraries are being used by the etcd process:
/usr/lib64/libc-2.17.so
/usr/lib64/libdl-2.17.so
/usr/lib64/libpthread-2.17.so
/usr/lib64/ld-2.17.so
These are all very standard CentOS system libraries, I doubt the bug is in the library, but at least you should be able to reproduce the same setup fairly easily. Many people in comments on the older issues reported similar setup (3 nodes + 1 proxy) was the way to reproduce this problem, so it would appear to be quite consistently happening. We are running a bunch of web servers, and a load balancer, sharing session data via etcd which should be fairly simple read/write type operations that can easily be simulated for testing. I'm guessing that the content of the data is irrelevant; typical size of data block might be approx 1k bytes.
Checking with netstat as per above shows more and more lines of output as more activity goes via the proxy. Over time, the available file-handles are consumed and eventually it will refuse connections.
What is your OS file descriptor limit?
The OS limit is huge. But that's not the point, because I can get a couple of days of operation out of the etcd proxy before needing a restart, and during that time the number of open sockets grows. Eventually it hits the prlimit on that etcd process (around 64k).
# cat /proc/sys/fs/file-max
379890
# prlimit --pid 12043
RESOURCE DESCRIPTION SOFT HARD UNITS
AS address space limit unlimited unlimited bytes
CORE max core file size 0 unlimited blocks
CPU CPU time unlimited unlimited seconds
DATA max data size unlimited unlimited bytes
FSIZE max file size unlimited unlimited blocks
LOCKS max number of file locks held unlimited unlimited
MEMLOCK max locked-in-memory address space 65536 65536 bytes
MSGQUEUE max bytes in POSIX mqueues 819200 819200 bytes
NICE max nice prio allowed to raise 0 0
NOFILE max number of open files 65536 65536
NPROC max number of processes 14990 14990
RSS max resident set size unlimited unlimited pages
RTPRIO max real-time priority 0 0
RTTIME timeout for real-time tasks unlimited unlimited microsecs
SIGPENDING max number of pending signals 14990 14990
STACK max stack size 8388608 unlimited bytes
But realistically less than 500 sockets on the proxy process should be plenty for what I'm doing, if it was cleaning up correctly. You just have to find the place where it should be closing those sockets, and close them out when no longer used. That's the only answer. Expanding limits can never fix the problem because the number of sockets just continues to grow with time... this is delaying the problem at best.
We might have kept connections alive for reuse.
https://github.com/coreos/etcd/pull/2900 configures MaxIdleConnsPerHost as 128.
https://github.com/coreos/etcd/blob/014c3750999d0658203761ad35793847001b76ca/etcdmain/etcd.go#L211-L215
We could call *http.Transport.CloseIdleConnections everytime proxy happens, but connections won't be reused. Or set *http.Transport.IdleConnTimeout.
@lnx-bsp Do you want to help fix this? We are currently busy with v3 and grpc-proxy features. Won't have time to address this for awhile.
I will check with the boss how much resource they want to put into this. I'm better with C programming, not so strong on golang. We might consider going to a configuration without any proxies if that's not to difficult to achieve.
Reuse of idle connections is fine, if it also would reduce the need for new sockets being opened. I will collect some tcpdump traffic to confirm, but my general feeling is that none of these are getting reused in practice. The count of sockets is much larger than 128, it goes right into the thousands, and they sit around for more than 24 hours so whatever kind of timeout you have in operation does not appear to be working properly. Let me come back after studying the network traffic.
I collected about 4 hours of packet capture on TCP port 2379 only looking at traffic between servers (ignoring local traffic). I see a lot of connections open from ephemeral ports on the proxy node over to port 2379 on the other three nodes, and each of these is sending TCP keep-alives back and forth (burst of 4 packets every 30 seconds, 2 packets in one direction, 2 in the other direction like this):
02:17:26.225144 IP 192.0.2.104.32938 > 192.0.2.101.2379: Flags [.], ack 1, win 236, options [nop,nop,TS val 1332026880 ecr 1357195941], length 0
02:17:26.226092 IP 192.0.2.101.2379 > 192.0.2.104.32938: Flags [.], ack 1, win 235, options [nop,nop,TS val 1357226021 ecr 1331996891], length 0
02:17:26.316507 IP 192.0.2.101.2379 > 192.0.2.104.32938: Flags [.], ack 1, win 235, options [nop,nop,TS val 1357226112 ecr 1331996891], length 0
02:17:26.316510 IP 192.0.2.104.32938 > 192.0.2.101.2379: Flags [.], ack 1, win 236, options [nop,nop,TS val 1332026971 ecr 1357226021], length 0
Those are "length 0" at the TCP layer (i.e. all the right headers are present but no stream data is moving). They serve only to keep those connections open. So from a firewall perspective they would not appear to be idle connections, however from an application perspective they are indeed idle.
I can see a great many of these type of connections (identified based on the ephemeral port number) and other than sending keep-alives no real data goes across for the hours that I collected data. Thus, I'm confident that the connections are getting neither closed nor reused, they are simply generating network traffic and consuming memory. If you run these through a firewall or VPN then what happens is the connection tracking table fills up and it might respond as if it was a "Denial of Service" when it sees a lot of connections sitting doing keep alive and nothing else.
https://github.com/coreos/etcd/pull/9336 probably fixes this problem
Please try latest v3.3 with https://github.com/coreos/etcd/pull/9336.