Server version: 3.3.10
Client version: 3.3.10
Running
etcdctl --endpoints "https://10.0.0.13:2379,https://10.0.0.11:2379,https://10.0.0.12:2379" member list
will cause etcd servers to log errors. However, when adding spaces after the commas, such as
etcdctl --endpoints "https://10.0.0.13:2379, https://10.0.0.11:2379, https://10.0.0.12:2379" member list
no errors will be logged on etcd servers. This is predictable in both specifying the --endpoints flag in-line as well as via ETCDCTL_ENDPOINTS environment variable.
This issue was discovered while upgrading from 3.2.24 to 3.3.10. We did not see this same issue with 3.2.24. Note also that these etcdctl member list command return successfully without issue and with proper data. This is only an issue of logs.
Example error logs:
Mar 26 20:53:34 vm-etcd-1 etcd[27805]: rejected connection from "10.0.0.11:57684" (error "EOF", ServerName "")
Mar 26 20:53:38 vm-etcd-1 etcd[27805]: rejected connection from "10.0.0.11:57708" (error "EOF", ServerName "")
Full example command for testing:
ETCDCTL_API=3 \
ETCDCTL_ENDPOINTS="https://10.0.0.13:2379,https://10.0.0.11:2379,https://10.0.0.12:2379" \
ETCDCTL_CACERT="/etc/kubernetes/certs/kubernetes_ca.pem" \
ETCDCTL_KEY="/etc/kubernetes/certs/etcd_server.key" \
ETCDCTL_CERT="/etc/kubernetes/certs/etcd_server.pem" \
etcdctl member list
Relevant other issues: #10040 and #10391 were both closed as duplicates of #9949, however #9949 does not appear to be related to this particular issue.
This is the problem about the go tls standard library13523.
It happened in master version too.
The associated issuses like
@gyuho
I can confirm that the rejected connection log entries do not appear when I use the --endpoints flag with spaces in between the endpoints. But for ETCDCTL_ENDPOINTS env var, I still get the rejected connection log entries with spaces in between the endpoints. My env Var is set like this:
ETCDCTL_ENDPOINTS="https://k8s-etcd-00.example.com:2379, https://k8s-etcd-01.example.com:2379, https://k8s-etcd-02.example.com:2379"
I can confirm same issue.
[root@justin-cwes-03 ~]# etcdctl version
etcdctl version: 3.3.15
API version: 3.3
seems etcdctl did not use correct server IP to validate server cert, it always use 1st endpoint ip to validate all servers.
[root@justin-cwes-03 ~]# etcdctl --endpoints="https://192.16.1.18:2379,https://192.16.1.25:2379,https://192.16.1.17:2379" get /xxx --debug
ETCDCTL_CACERT=/etc/etcd/ssl/ca.pem
ETCDCTL_CERT=/etc/etcd/ssl/etcd-client.pem
ETCDCTL_COMMAND_TIMEOUT=5s
ETCDCTL_DEBUG=true
ETCDCTL_DIAL_TIMEOUT=2s
ETCDCTL_DISCOVERY_SRV=
ETCDCTL_ENDPOINTS=[https://192.16.1.18:2379,https://192.16.1.25:2379,https://192.16.1.17:2379]
ETCDCTL_HEX=false
ETCDCTL_INSECURE_DISCOVERY=true
ETCDCTL_INSECURE_SKIP_TLS_VERIFY=false
ETCDCTL_INSECURE_TRANSPORT=true
ETCDCTL_KEEPALIVE_TIME=2s
ETCDCTL_KEEPALIVE_TIMEOUT=6s
ETCDCTL_KEY=/etc/etcd/ssl/etcd-client-key.pem
ETCDCTL_USER=
ETCDCTL_WRITE_OUT=simple
WARNING: 2019/10/23 06:15:03 Adjusting keepalive ping interval to minimum period of 10s
WARNING: 2019/10/23 06:15:03 Adjusting keepalive ping interval to minimum period of 10s
INFO: 2019/10/23 06:15:03 parsed scheme: "endpoint"
INFO: 2019/10/23 06:15:03 ccResolverWrapper: sending new addresses to cc: [{https://192.16.1.18:2379 0 <nil>} {https://192.16.1.25:2379 0 <nil>} {https://192.16.1.17:2379 0 <nil>}]
WARNING: 2019/10/23 06:15:03 grpc: addrConn.createTransport failed to connect to {https://192.16.1.25:2379 0 <nil>}. Err :connection error: desc = "transport: authentication handshake failed: x509: certificate is valid for 127.0.0.1, 192.16.1.25, 192.16.1.25, ::1, not 192.16.1.18". Reconnecting...
WARNING: 2019/10/23 06:15:03 grpc: addrConn.createTransport failed to connect to {https://192.16.1.25:2379 0 <nil>}. Err :connection error: desc = "transport: authentication handshake failed: x509: certificate is valid for 127.0.0.1, 192.16.1.25, 192.16.1.25, ::1, not 192.16.1.18". Reconnecting...
WARNING: 2019/10/23 06:15:03 grpc: addrConn.createTransport failed to connect to {https://192.16.1.17:2379 0 <nil>}. Err :connection error: desc = "transport: authentication handshake failed: x509: certificate is valid for 127.0.0.1, 192.16.1.17, 192.16.1.17, ::1, not 192.16.1.18". Reconnecting...
WARNING: 2019/10/23 06:15:03 grpc: addrConn.createTransport failed to connect to {https://192.16.1.17:2379 0 <nil>}. Err :connection error: desc = "transport: authentication handshake failed: x509: certificate is valid for 127.0.0.1, 192.16.1.17, 192.16.1.17, ::1, not 192.16.1.18". Reconnecting...
/xxx
xxx
okay, already fixed by https://github.com/etcd-io/etcd/pull/11184
@jcrowthe as @jejer has verified the issue seems fixed now. I am closing it, but please feel free to reopen if needed. Thanks!
I meet the same issue when using etcd 3.3.18 when I upgrade from 3.2.24. Seems a similar issue still exists.
etcd Version: 3.3.18
Git SHA: 3c8740a79
Go Version: go1.12.9
Go OS/Arch: linux/amd64
We are seeing this issue on 3.3.18 as well.
@spzala Sorry for recomment in this closed issue.
Unfortunately, I met this at version 3.4.9 after upgrading from version 3.3.10
Jun 06 11:16:08 kubernetes-master-03 etcd[9262]: serving insecure client requests on 127.0.0.1:2379, this is strongly discouraged!
Jun 06 11:16:08 kubernetes-master-03 etcd[9262]: serving client requests on 10.3.145.10:2379
Jun 06 11:16:08 kubernetes-master-03 etcd[9262]: established a TCP streaming connection with peer ba80ef9cc6549fd9 (stream Message reader)
Jun 06 11:16:08 kubernetes-master-03 etcd[9262]: established a TCP streaming connection with peer e48bf6e4b3b874d (stream Message reader)
Jun 06 11:16:08 kubernetes-master-03 etcd[9262]: da1ce4c886f80f9c initialized peer connection; fast-forwarding 8 ticks (election ticks 10) with 2 active peer(s)
Jun 06 11:16:12 kubernetes-master-03 etcd[9262]: updated the cluster version from 3.0 to 3.4
Jun 06 11:16:12 kubernetes-master-03 etcd[9262]: enabled capabilities for version 3.4
Jun 06 11:19:58 kubernetes-master-03 etcd[9262]: rejected connection from "10.3.145.8:54824" (error "EOF", ServerName "")
Jun 06 11:22:03 kubernetes-master-03 etcd[9262]: rejected connection from "10.3.145.8:55326" (error "EOF", ServerName "")
Jun 06 11:22:24 kubernetes-master-03 etcd[9262]: rejected connection from "10.3.145.8:55432" (error "EOF", ServerName "")
Jun 06 11:22:29 kubernetes-master-03 etcd[9262]: rejected connection from "10.3.145.8:55458" (error "EOF", ServerName "")
The endpoints status below:
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| 10.3.145.8:2379 | ba80ef9cc6549fd9 | 3.4.9 | 23 MB | true | false | 2 | 9 | 9 | |
| 10.3.145.9:2379 | e48bf6e4b3b874d | 3.4.9 | 23 MB | false | false | 2 | 9 | 9 | |
| 10.3.145.10:2379 | da1ce4c886f80f9c | 3.4.9 | 23 MB | false | false | 2 | 9 | 9 | |
+------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
I have followed issue #11184 #10391 #10634 and the cluster was in health status by stable operation of kubernetes production cluster,so did i miss anything momentous about this?
Still hitting this on v3.4.13 as well