Etcd: (error "EOF", ServerName "") error on etcd servers

Created on 26 Mar 2019 · 10Comments · Source: etcd-io/etcd

Environment

Server version: 3.3.10
Client version: 3.3.10

Issue

Running

etcdctl --endpoints "https://10.0.0.13:2379,https://10.0.0.11:2379,https://10.0.0.12:2379" member list

will cause etcd servers to log errors. However, when adding spaces after the commas, such as

etcdctl --endpoints "https://10.0.0.13:2379, https://10.0.0.11:2379, https://10.0.0.12:2379" member list

no errors will be logged on etcd servers. This is predictable in both specifying the --endpoints flag in-line as well as via ETCDCTL_ENDPOINTS environment variable.

This issue was discovered while upgrading from 3.2.24 to 3.3.10. We did not see this same issue with 3.2.24. Note also that these etcdctl member list command return successfully without issue and with proper data. This is only an issue of logs.

Example error logs:

Mar 26 20:53:34 vm-etcd-1 etcd[27805]: rejected connection from "10.0.0.11:57684" (error "EOF", ServerName "")
Mar 26 20:53:38 vm-etcd-1 etcd[27805]: rejected connection from "10.0.0.11:57708" (error "EOF", ServerName "")

Full example command for testing:

ETCDCTL_API=3 \
ETCDCTL_ENDPOINTS="https://10.0.0.13:2379,https://10.0.0.11:2379,https://10.0.0.12:2379" \
ETCDCTL_CACERT="/etc/kubernetes/certs/kubernetes_ca.pem" \
ETCDCTL_KEY="/etc/kubernetes/certs/etcd_server.key" \
ETCDCTL_CERT="/etc/kubernetes/certs/etcd_server.pem" \
etcdctl member list

Relevant other issues: #10040 and #10391 were both closed as duplicates of #9949, however #9949 does not appear to be related to this particular issue.

Source

jcrowthe

👀1 👍1

All 10 comments

This is the problem about the go tls standard library13523.
It happened in master version too.

The associated issuses like

8534

8798

8803

@gyuho

johncming on 27 Mar 2019

👀2

I can confirm that the rejected connection log entries do not appear when I use the --endpoints flag with spaces in between the endpoints. But for ETCDCTL_ENDPOINTS env var, I still get the rejected connection log entries with spaces in between the endpoints. My env Var is set like this:

ETCDCTL_ENDPOINTS="https://k8s-etcd-00.example.com:2379, https://k8s-etcd-01.example.com:2379, https://k8s-etcd-02.example.com:2379"

TheKangaroo on 6 Jun 2019

I can confirm same issue.

[root@justin-cwes-03 ~]# etcdctl version
etcdctl version: 3.3.15
API version: 3.3

jejer on 23 Oct 2019

seems etcdctl did not use correct server IP to validate server cert, it always use 1st endpoint ip to validate all servers.

[root@justin-cwes-03 ~]# etcdctl --endpoints="https://192.16.1.18:2379,https://192.16.1.25:2379,https://192.16.1.17:2379" get /xxx --debug
ETCDCTL_CACERT=/etc/etcd/ssl/ca.pem
ETCDCTL_CERT=/etc/etcd/ssl/etcd-client.pem
ETCDCTL_COMMAND_TIMEOUT=5s
ETCDCTL_DEBUG=true
ETCDCTL_DIAL_TIMEOUT=2s
ETCDCTL_DISCOVERY_SRV=
ETCDCTL_ENDPOINTS=[https://192.16.1.18:2379,https://192.16.1.25:2379,https://192.16.1.17:2379]
ETCDCTL_HEX=false
ETCDCTL_INSECURE_DISCOVERY=true
ETCDCTL_INSECURE_SKIP_TLS_VERIFY=false
ETCDCTL_INSECURE_TRANSPORT=true
ETCDCTL_KEEPALIVE_TIME=2s
ETCDCTL_KEEPALIVE_TIMEOUT=6s
ETCDCTL_KEY=/etc/etcd/ssl/etcd-client-key.pem
ETCDCTL_USER=
ETCDCTL_WRITE_OUT=simple
WARNING: 2019/10/23 06:15:03 Adjusting keepalive ping interval to minimum period of 10s
WARNING: 2019/10/23 06:15:03 Adjusting keepalive ping interval to minimum period of 10s
INFO: 2019/10/23 06:15:03 parsed scheme: "endpoint"
INFO: 2019/10/23 06:15:03 ccResolverWrapper: sending new addresses to cc: [{https://192.16.1.18:2379 0  <nil>} {https://192.16.1.25:2379 0  <nil>} {https://192.16.1.17:2379 0  <nil>}]
WARNING: 2019/10/23 06:15:03 grpc: addrConn.createTransport failed to connect to {https://192.16.1.25:2379 0  <nil>}. Err :connection error: desc = "transport: authentication handshake failed: x509: certificate is valid for 127.0.0.1, 192.16.1.25, 192.16.1.25, ::1, not 192.16.1.18". Reconnecting...
WARNING: 2019/10/23 06:15:03 grpc: addrConn.createTransport failed to connect to {https://192.16.1.25:2379 0  <nil>}. Err :connection error: desc = "transport: authentication handshake failed: x509: certificate is valid for 127.0.0.1, 192.16.1.25, 192.16.1.25, ::1, not 192.16.1.18". Reconnecting...
WARNING: 2019/10/23 06:15:03 grpc: addrConn.createTransport failed to connect to {https://192.16.1.17:2379 0  <nil>}. Err :connection error: desc = "transport: authentication handshake failed: x509: certificate is valid for 127.0.0.1, 192.16.1.17, 192.16.1.17, ::1, not 192.16.1.18". Reconnecting...
WARNING: 2019/10/23 06:15:03 grpc: addrConn.createTransport failed to connect to {https://192.16.1.17:2379 0  <nil>}. Err :connection error: desc = "transport: authentication handshake failed: x509: certificate is valid for 127.0.0.1, 192.16.1.17, 192.16.1.17, ::1, not 192.16.1.18". Reconnecting...
/xxx
xxx

jejer on 23 Oct 2019

okay, already fixed by https://github.com/etcd-io/etcd/pull/11184

jejer on 23 Oct 2019

👍1

@jcrowthe as @jejer has verified the issue seems fixed now. I am closing it, but please feel free to reopen if needed. Thanks!

spzala on 23 Oct 2019

I meet the same issue when using etcd 3.3.18 when I upgrade from 3.2.24. Seems a similar issue still exists.

etcd Version: 3.3.18
Git SHA: 3c8740a79
Go Version: go1.12.9
Go OS/Arch: linux/amd64

cwdsuzhou on 31 Dec 2019

We are seeing this issue on 3.3.18 as well.

davissp14 on 20 Apr 2020

@spzala Sorry for recomment in this closed issue.
Unfortunately, I met this at version 3.4.9 after upgrading from version 3.3.10

Jun 06 11:16:08 kubernetes-master-03 etcd[9262]: serving insecure client requests on 127.0.0.1:2379, this is strongly discouraged!
Jun 06 11:16:08 kubernetes-master-03 etcd[9262]: serving client requests on 10.3.145.10:2379
Jun 06 11:16:08 kubernetes-master-03 etcd[9262]: established a TCP streaming connection with peer ba80ef9cc6549fd9 (stream Message reader)
Jun 06 11:16:08 kubernetes-master-03 etcd[9262]: established a TCP streaming connection with peer e48bf6e4b3b874d (stream Message reader)
Jun 06 11:16:08 kubernetes-master-03 etcd[9262]: da1ce4c886f80f9c initialized peer connection; fast-forwarding 8 ticks (election ticks 10) with 2 active peer(s)
Jun 06 11:16:12 kubernetes-master-03 etcd[9262]: updated the cluster version from 3.0 to 3.4
Jun 06 11:16:12 kubernetes-master-03 etcd[9262]: enabled capabilities for version 3.4
Jun 06 11:19:58 kubernetes-master-03 etcd[9262]: rejected connection from "10.3.145.8:54824" (error "EOF", ServerName "")
Jun 06 11:22:03 kubernetes-master-03 etcd[9262]: rejected connection from "10.3.145.8:55326" (error "EOF", ServerName "")
Jun 06 11:22:24 kubernetes-master-03 etcd[9262]: rejected connection from "10.3.145.8:55432" (error "EOF", ServerName "")
Jun 06 11:22:29 kubernetes-master-03 etcd[9262]: rejected connection from "10.3.145.8:55458" (error "EOF", ServerName "")

The endpoints status below:

|     ENDPOINT     |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|  10.3.145.8:2379 | ba80ef9cc6549fd9 |   3.4.9 |   23 MB |      true |      false |         2 |          9 |                  9 |        |
|  10.3.145.9:2379 |  e48bf6e4b3b874d |   3.4.9 |   23 MB |     false |      false |         2 |          9 |                  9 |        |
| 10.3.145.10:2379 | da1ce4c886f80f9c |   3.4.9 |   23 MB |     false |      false |         2 |          9 |                  9 |        |
+------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

I have followed issue #11184 #10391 #10634 and the cluster was in health status by stable operation of kubernetes production cluster,so did i miss anything momentous about this?