Bug Report
kubeadm version (use kubeadm version):
v.1.14.0
Environment:
kubectl version): v1.13.5Cloud provider or hardware configuration: AWS EC2
OS (e.g. from /etc/os-release): Ubuntu 16.04.6 LTS
Kernel (e.g. uname -a): Linux k8s-node-1 4.4.0-143-generic #169-Ubuntu SMP Thu Feb 7 07:56:38 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Others: kubeadm provisioned single master k8s cluster (3 nodes), this cluster was created using kubeadm when k8s was at v1.9.0.
Use kubeadm to ugrade the cluster, v1.13.4 to v1.13.5 was successful. To v1.14.0 failed becaues kubeadm upgrade plan pre-flight checks trying to connec to etcd using the node's private IP (assigned to NIC eth0) instead of the loopback address etcd is binding to.
Error
root@k8s-node-1:~# sudo kubeadm upgrade plan
[preflight] Running pre-flight checks.
[upgrade] Making sure the cluster is healthy:
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
error syncing endpoints with etc: dial tcp 192.168.100.21:2379: connect: connection refused
As per the kubeadm init workflow, single master k8s cluster etcd pod is created via static pod manifests. By looking at the manifest, etcd binds 127.0.0.1 and is not exposed to external world.
root@k8s-node-1:/etc/kubernetes/manifests# cat etcd.yaml
apiVersion: v1
kind: Pod
metadata:
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ""
creationTimestamp: null
labels:
component: etcd
tier: control-plane
name: etcd
namespace: kube-system
spec:
containers:
- command:
- etcd
- --advertise-client-urls=https://127.0.0.1:2379
- --cert-file=/etc/kubernetes/pki/etcd/server.crt
- --client-cert-auth=true
- --data-dir=/var/lib/etcd
- --initial-advertise-peer-urls=https://127.0.0.1:2380
- --initial-cluster=k8s-node-1=https://127.0.0.1:2380
- --key-file=/etc/kubernetes/pki/etcd/server.key
- --listen-client-urls=https://127.0.0.1:2379
- --listen-peer-urls=https://127.0.0.1:2380
- --name=k8s-node-1
- --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
- --peer-client-cert-auth=true
- --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
- --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
- --snapshot-count=10000
- --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
image: k8s.gcr.io/etcd:3.2.24
imagePullPolicy: IfNotPresent
livenessProbe:
exec:
command:
- /bin/sh
- -ec
- ETCDCTL_API=3 etcdctl --endpoints=https://[127.0.0.1]:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt
--cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt --key=/etc/kubernetes/pki/etcd/healthcheck-client.key
get foo
failureThreshold: 8
initialDelaySeconds: 15
timeoutSeconds: 15
name: etcd
resources: {}
volumeMounts:
- mountPath: /var/lib/etcd
name: etcd-data
- mountPath: /etc/kubernetes/pki/etcd
name: etcd-certs
hostNetwork: true
priorityClassName: system-cluster-critical
volumes:
- hostPath:
path: /var/lib/etcd
type: DirectoryOrCreate
name: etcd-data
- hostPath:
path: /etc/kubernetes/pki/etcd
type: DirectoryOrCreate
name: etcd-certs
status: {}
kubeadm upgrade plan should work as expected to output the details, just like v1.13.4 to v.1.3.5.
ubuntu@k8s-node-1:~$ sudo kubeadm upgrade plan
[preflight] Running pre-flight checks.
[upgrade] Making sure the cluster is healthy:
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[upgrade] Fetching available versions to upgrade to
[upgrade/versions] Cluster version: v1.13.4
[upgrade/versions] kubeadm version: v1.13.5
I0327 09:52:14.655319 12224 version.go:237] remote version is much newer: v1.14.0; falling back to: stable-1.13
[upgrade/versions] Latest stable version: v1.13.5
[upgrade/versions] Latest version in the v1.13 series: v1.13.5
Components that must be upgraded manually after you have upgraded the control plane with 'kubeadm upgrade apply':
COMPONENT CURRENT AVAILABLE
Kubelet 2 x v1.13.4 v1.13.5
1 x v1.13.5 v1.13.5
Upgrade to the latest version in the v1.13 series:
COMPONENT CURRENT AVAILABLE
API Server v1.13.4 v1.13.5
Controller Manager v1.13.4 v1.13.5
Scheduler v1.13.4 v1.13.5
Kube Proxy v1.13.4 v1.13.5
CoreDNS 1.2.6 1.2.6
Etcd 3.2.24 3.2.24
You can now apply the upgrade by executing the following command:
kubeadm upgrade apply v1.13.5
_____________________________________________________________________
ubuntu@k8s-node-1:~$ sudo kubeadm upgrade apply v1.13.5
Follow the upgrade guide, upgrade any v.1.13.x cluster (created using kubeadm) to v1.14.0.
I've tried to change the bind address but it has so many dependencies that breaks more than it fixes. Also tried to expose the pod as NodePort service, tried using iptables rules to forward traffic destined to the IP address (192.168.100.12 in this case) port 2379 to loopback with no luck.
Is there a way to override the etcd endpoint when running kubeadm upgrade plan that'll be the easiest solution.
Hmm...
thanks for the report.
i will try to reproduce your problem.
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
luboitvbox Ready master 6m9s v1.13.5
$ kubeadm version --output=short
v1.14.0
$ sudo kubeadm upgrade plan
[preflight] Running pre-flight checks.
[upgrade] Making sure the cluster is healthy:
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[upgrade] Fetching available versions to upgrade to
[upgrade/versions] Cluster version: v1.13.5
[upgrade/versions] kubeadm version: v1.14.0
Awesome, you're up-to-date! Enjoy!
here is what get, this is a bit of a bug on it's own because it's telling me that i'm up to date while it should be telling me to update to v1.14.0. i will log a bug about this Awesome, you're up-to-date! Enjoy! case.
but my etcd manifest looks like this:
$ sudo cat /etc/kubernetes/manifests/etcd.yaml
...
containers:
- command:
- etcd
- --advertise-client-urls=https://192.168.0.102:2379
- --cert-file=/etc/kubernetes/pki/etcd/server.crt
- --client-cert-auth=true
- --data-dir=/var/lib/etcd
- --initial-advertise-peer-urls=https://192.168.0.102:2380
- --initial-cluster=luboitvbox=https://192.168.0.102:2380
- --key-file=/etc/kubernetes/pki/etcd/server.key
- --listen-client-urls=https://127.0.0.1:2379,https://192.168.0.102:2379
- --listen-peer-urls=https://192.168.0.102:2380
- --name=luboitvbox
- --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
- --peer-client-cert-auth=true
- --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
- --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
- --snapshot-count=10000
- --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
image: k8s.gcr.io/etcd:3.2.24
imagePullPolicy: IfNotPresent
livenessProbe:
exec:
command:
- /bin/sh
- -ec
- ETCDCTL_API=3 etcdctl --endpoints=https://[127.0.0.1]:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt
--cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt --key=/etc/kubernetes/pki/etcd/healthcheck-client.key
...
did you happen to create this cluster using 1.12 before upgrading to 1.13?
i remember that we did some changes to the etcd addresses related to HA setups.
try making your manifest like the above.
$ sudo kubeadm upgrade apply v1.14.0
...
[upgrade/successful] SUCCESS! Your cluster was upgraded to "v1.14.0". Enjoy!
[upgrade/kubelet] Now that your control plane is upgraded, please proceed with upgrading your kubelets if you haven't already done so.
...
# (upgrade kubelet)
$ sudo systemctl restart kubelet
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
luboitvbox Ready master 23m v1.14.0
the upgrade worked for me.
here is what get, this is a bit of a bug on it's own because it's telling me that i'm up to date while it should be telling me to update to v1.14.0. i will log a bug about this Awesome, you're up-to-date! Enjoy! case.
@neolit123 Thanks for the comments. The cluster was initially created using kubeadm v1.9.x. Later on rebuilt, definitely used version before v.1.12.0. No wonder the etcd static pod manifest is different.
What exactly has changed for etcd manifest (since v1.12.0)? Tried to search for that but no luck.
I'll try to generate new static pod manifests using latest version from a different machine and see if I can figure it out (also the dependencies).
@terrywang
it was done here so that we can properly support stacked etcd members in an HA setup:
https://github.com/kubernetes/kubernetes/pull/69486
more details here:
https://github.com/kubernetes/kubeadm/issues/1123
that said i think we had a way to handle this type of upgrade transparently between 1.12 and 1.13, so your 1.13 etcd manifest should have been auto-converted to use the network interface address. possibly something went wrong in the process, but also this is the first report we are seeing related to this.
please let me know if you remember anything like modifying the etcd manifests manually, which could have broke our 1.12->1.13 logic.
closing in favor of: https://github.com/kubernetes/kubeadm/issues/1471
@neolit123 Thanks again for the info. Good to know.
I've regenerated the static pod manifests using latest version of kubeadm to run the phase on a different VM, compared the differences and made necessary changes to the one in my cluster.
NOTE: in my case
- --listen-client-urls=https://127.0.0.1:2379,https://192.168.100.21:2379.
However, running kubeadm upgrade plan after etcd pod was started, I got the following error:
root@k8s-node-1:~# kubeadm upgrade plan
[preflight] Running pre-flight checks.
[upgrade] Making sure the cluster is healthy:
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
error syncing endpoints with etc: context deadline exceeded
Will follow #1471 to regenerate the certificates for etcd when I have time.
Thanks terry for the info, I had the same problem. Also following #1471
My original cluster originated from Kubernetes 1.8, and rebuild during the 1.11 upgrade because broke everything. My etcd also listens to localhost only:
apiVersion: v1
kind: Pod
metadata:
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ""
creationTimestamp: null
labels:
component: etcd
tier: control-plane
name: etcd
namespace: kube-system
spec:
containers:
- command:
- etcd
- --advertise-client-urls=https://127.0.0.1:2379
- --cert-file=/etc/kubernetes/pki/etcd/server.crt
- --client-cert-auth=true
- --data-dir=/var/lib/etcd
- --initial-advertise-peer-urls=https://127.0.0.1:2380
- --initial-cluster=phenomenal.edoburu.nl=https://127.0.0.1:2380
- --key-file=/etc/kubernetes/pki/etcd/server.key
- --listen-client-urls=https://127.0.0.1:2379
- --listen-peer-urls=https://127.0.0.1:2380
- --name=phenomenal.edoburu.nl
- --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
- --peer-client-cert-auth=true
- --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
- --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
- --snapshot-count=10000
- --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
image: k8s.gcr.io/etcd:3.2.24
imagePullPolicy: IfNotPresent
livenessProbe:
exec:
command:
- /bin/sh
- -ec
- ETCDCTL_API=3 etcdctl --endpoints=https://[127.0.0.1]:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt
--cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt --key=/etc/kubernetes/pki/etcd/healthcheck-client.key
get foo
failureThreshold: 8
initialDelaySeconds: 15
timeoutSeconds: 15
name: etcd
resources: {}
volumeMounts:
- mountPath: /var/lib/etcd
name: etcd-data
- mountPath: /etc/kubernetes/pki/etcd
name: etcd-certs
hostNetwork: true
priorityClassName: system-cluster-critical
volumes:
- hostPath:
path: /var/lib/etcd
type: DirectoryOrCreate
name: etcd-data
- hostPath:
path: /etc/kubernetes/pki/etcd
type: DirectoryOrCreate
name: etcd-certs
status: {}
+1
Same problem here, while trying to upgrade from 1.13.4 to 1.14.0:
# kubeadm upgrade plan
[preflight] Running pre-flight checks.
[upgrade] Making sure the cluster is healthy:
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
error syncing endpoints with etc: dial tcp 192.168.10.2:2379: connect: connection refused
#
fix should be up in 1.14.1
(to be released soon)
Update: I waited for kubeadm 1.14.1, it didn't actually fix the issue...
Luckily, simply by following the steps in #1471 mentioned by @mauilion I was able to leverage kubeadm phase (etcd-server and etcd) regenerate etcd TLS certificate to cover k8s node IP, reconfigure etcd with new listen-client-urls and start etcd, subsequently run the kubeadm upgrade plan.
The reason why kubeadm upgrade plan failed with the following error was because of the etcd server TLS certificate's SAN did not cover the k8s node's (on which etcd was running) IP, it simply failed to start.
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
error syncing endpoints with etc: context deadline exceeded
The certificate SAN should look like below
X509v3 extensions:
X509v3 Key Usage: critical
Digital Signature, Key Encipherment
X509v3 Extended Key Usage:
TLS Web Server Authentication, TLS Web Client Authentication
X509v3 Subject Alternative Name:
DNS:k8s-node-1, DNS:localhost, IP Address:192.168.100.21, IP Address:127.0.0.1, IP Address:0:0:0:0:0:0:0:1, IP Address:10.192.0.2
Update: I waited for kubeadm 1.14.1, it didn't actually fix the issue...
hm, it should have. the PR that @fabriziopandini created was merged and tested by at least a couple of people.
The certificate SAN should look like below
and your existing cert was missing 192.168.100.21?
Yes, existing etcd server certificate SAN was missing 192.168.100.21.
I may have forgotten to restore the static pod manifest for etcd (added node's private IP inside a VPC subnet to --listen-client-urls as a workaround when troubleshooting), this may be the reason why kubeadm upgrade plan still failed with v1.14.1.
Anyway, the problem is well solved.
Really appreciate your input and assistance, enjoyed the learning experience ;-)
I may have forgotten to restore the static pod manifest for etcd (added node's private IP inside a VPC subnet to --listen-client-urls as a workaround when troubleshooting), this may be the reason why kubeadm upgrade plan still failed with v1.14.1.
yes, that may be the cause.