kubeadm upgrade plan not working for v1.13.5 to v1.14.0

Created on 27 Mar 2019  路  14Comments  路  Source: kubernetes/kubeadm

Is this a BUG REPORT or FEATURE REQUEST?

Bug Report

Versions

kubeadm version (use kubeadm version):
v.1.14.0

Environment:

  • Kubernetes version (use kubectl version): v1.13.5
  • Cloud provider or hardware configuration: AWS EC2

  • OS (e.g. from /etc/os-release): Ubuntu 16.04.6 LTS

  • Kernel (e.g. uname -a): Linux k8s-node-1 4.4.0-143-generic #169-Ubuntu SMP Thu Feb 7 07:56:38 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

  • Others: kubeadm provisioned single master k8s cluster (3 nodes), this cluster was created using kubeadm when k8s was at v1.9.0.

What happened?

Use kubeadm to ugrade the cluster, v1.13.4 to v1.13.5 was successful. To v1.14.0 failed becaues kubeadm upgrade plan pre-flight checks trying to connec to etcd using the node's private IP (assigned to NIC eth0) instead of the loopback address etcd is binding to.

Error

root@k8s-node-1:~# sudo kubeadm upgrade plan
[preflight] Running pre-flight checks.
[upgrade] Making sure the cluster is healthy:
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
error syncing endpoints with etc: dial tcp 192.168.100.21:2379: connect: connection refused

As per the kubeadm init workflow, single master k8s cluster etcd pod is created via static pod manifests. By looking at the manifest, etcd binds 127.0.0.1 and is not exposed to external world.

root@k8s-node-1:/etc/kubernetes/manifests# cat etcd.yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    scheduler.alpha.kubernetes.io/critical-pod: ""
  creationTimestamp: null
  labels:
    component: etcd
    tier: control-plane
  name: etcd
  namespace: kube-system
spec:
  containers:
  - command:
    - etcd
    - --advertise-client-urls=https://127.0.0.1:2379
    - --cert-file=/etc/kubernetes/pki/etcd/server.crt
    - --client-cert-auth=true
    - --data-dir=/var/lib/etcd
    - --initial-advertise-peer-urls=https://127.0.0.1:2380
    - --initial-cluster=k8s-node-1=https://127.0.0.1:2380
    - --key-file=/etc/kubernetes/pki/etcd/server.key
    - --listen-client-urls=https://127.0.0.1:2379
    - --listen-peer-urls=https://127.0.0.1:2380
    - --name=k8s-node-1
    - --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
    - --peer-client-cert-auth=true
    - --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
    - --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    - --snapshot-count=10000
    - --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    image: k8s.gcr.io/etcd:3.2.24
    imagePullPolicy: IfNotPresent
    livenessProbe:
      exec:
        command:
        - /bin/sh
        - -ec
        - ETCDCTL_API=3 etcdctl --endpoints=https://[127.0.0.1]:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt
          --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt --key=/etc/kubernetes/pki/etcd/healthcheck-client.key
          get foo
      failureThreshold: 8
      initialDelaySeconds: 15
      timeoutSeconds: 15
    name: etcd
    resources: {}
    volumeMounts:
    - mountPath: /var/lib/etcd
      name: etcd-data
    - mountPath: /etc/kubernetes/pki/etcd
      name: etcd-certs
  hostNetwork: true
  priorityClassName: system-cluster-critical
  volumes:
  - hostPath:
      path: /var/lib/etcd
      type: DirectoryOrCreate
    name: etcd-data
  - hostPath:
      path: /etc/kubernetes/pki/etcd
      type: DirectoryOrCreate
    name: etcd-certs
status: {}

What you expected to happen?

kubeadm upgrade plan should work as expected to output the details, just like v1.13.4 to v.1.3.5.

ubuntu@k8s-node-1:~$ sudo kubeadm upgrade plan
[preflight] Running pre-flight checks.
[upgrade] Making sure the cluster is healthy:
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[upgrade] Fetching available versions to upgrade to
[upgrade/versions] Cluster version: v1.13.4
[upgrade/versions] kubeadm version: v1.13.5
I0327 09:52:14.655319   12224 version.go:237] remote version is much newer: v1.14.0; falling back to: stable-1.13
[upgrade/versions] Latest stable version: v1.13.5
[upgrade/versions] Latest version in the v1.13 series: v1.13.5

Components that must be upgraded manually after you have upgraded the control plane with 'kubeadm upgrade apply':
COMPONENT   CURRENT       AVAILABLE
Kubelet     2 x v1.13.4   v1.13.5
            1 x v1.13.5   v1.13.5

Upgrade to the latest version in the v1.13 series:

COMPONENT            CURRENT   AVAILABLE
API Server           v1.13.4   v1.13.5
Controller Manager   v1.13.4   v1.13.5
Scheduler            v1.13.4   v1.13.5
Kube Proxy           v1.13.4   v1.13.5
CoreDNS              1.2.6     1.2.6
Etcd                 3.2.24    3.2.24

You can now apply the upgrade by executing the following command:

        kubeadm upgrade apply v1.13.5

_____________________________________________________________________

ubuntu@k8s-node-1:~$ sudo kubeadm upgrade apply v1.13.5

How to reproduce it (as minimally and precisely as possible)?

Follow the upgrade guide, upgrade any v.1.13.x cluster (created using kubeadm) to v1.14.0.

I've tried to change the bind address but it has so many dependencies that breaks more than it fixes. Also tried to expose the pod as NodePort service, tried using iptables rules to forward traffic destined to the IP address (192.168.100.12 in this case) port 2379 to loopback with no luck.

Is there a way to override the etcd endpoint when running kubeadm upgrade plan that'll be the easiest solution.

Anything else we need to know?

Hmm...

areupgrades help wanted prioritawaiting-more-evidence

All 14 comments

thanks for the report.
i will try to reproduce your problem.

$ kubectl get nodes
NAME         STATUS   ROLES    AGE    VERSION
luboitvbox   Ready    master   6m9s   v1.13.5

$ kubeadm version --output=short
v1.14.0

$ sudo kubeadm upgrade plan
[preflight] Running pre-flight checks.
[upgrade] Making sure the cluster is healthy:
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[upgrade] Fetching available versions to upgrade to
[upgrade/versions] Cluster version: v1.13.5
[upgrade/versions] kubeadm version: v1.14.0

Awesome, you're up-to-date! Enjoy!

here is what get, this is a bit of a bug on it's own because it's telling me that i'm up to date while it should be telling me to update to v1.14.0. i will log a bug about this Awesome, you're up-to-date! Enjoy! case.

but my etcd manifest looks like this:

$ sudo cat /etc/kubernetes/manifests/etcd.yaml
...
containers:
  - command:
    - etcd
    - --advertise-client-urls=https://192.168.0.102:2379
    - --cert-file=/etc/kubernetes/pki/etcd/server.crt
    - --client-cert-auth=true
    - --data-dir=/var/lib/etcd
    - --initial-advertise-peer-urls=https://192.168.0.102:2380
    - --initial-cluster=luboitvbox=https://192.168.0.102:2380
    - --key-file=/etc/kubernetes/pki/etcd/server.key
    - --listen-client-urls=https://127.0.0.1:2379,https://192.168.0.102:2379
    - --listen-peer-urls=https://192.168.0.102:2380
    - --name=luboitvbox
    - --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
    - --peer-client-cert-auth=true
    - --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
    - --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    - --snapshot-count=10000
    - --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    image: k8s.gcr.io/etcd:3.2.24
    imagePullPolicy: IfNotPresent
    livenessProbe:
      exec:
        command:
        - /bin/sh
        - -ec
        - ETCDCTL_API=3 etcdctl --endpoints=https://[127.0.0.1]:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt
          --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt --key=/etc/kubernetes/pki/etcd/healthcheck-client.key
...

did you happen to create this cluster using 1.12 before upgrading to 1.13?

i remember that we did some changes to the etcd addresses related to HA setups.
try making your manifest like the above.

$ sudo kubeadm upgrade apply v1.14.0
...
[upgrade/successful] SUCCESS! Your cluster was upgraded to "v1.14.0". Enjoy!

[upgrade/kubelet] Now that your control plane is upgraded, please proceed with upgrading your kubelets if you haven't already done so.

...
# (upgrade kubelet)
$ sudo systemctl restart kubelet
$ kubectl get nodes
NAME         STATUS   ROLES    AGE   VERSION
luboitvbox   Ready    master   23m   v1.14.0

the upgrade worked for me.

here is what get, this is a bit of a bug on it's own because it's telling me that i'm up to date while it should be telling me to update to v1.14.0. i will log a bug about this Awesome, you're up-to-date! Enjoy! case.

logged:
https://github.com/kubernetes/kubeadm/issues/1470

@neolit123 Thanks for the comments. The cluster was initially created using kubeadm v1.9.x. Later on rebuilt, definitely used version before v.1.12.0. No wonder the etcd static pod manifest is different.

What exactly has changed for etcd manifest (since v1.12.0)? Tried to search for that but no luck.

I'll try to generate new static pod manifests using latest version from a different machine and see if I can figure it out (also the dependencies).

@terrywang
it was done here so that we can properly support stacked etcd members in an HA setup:
https://github.com/kubernetes/kubernetes/pull/69486

more details here:
https://github.com/kubernetes/kubeadm/issues/1123

that said i think we had a way to handle this type of upgrade transparently between 1.12 and 1.13, so your 1.13 etcd manifest should have been auto-converted to use the network interface address. possibly something went wrong in the process, but also this is the first report we are seeing related to this.

please let me know if you remember anything like modifying the etcd manifests manually, which could have broke our 1.12->1.13 logic.

@neolit123 Thanks again for the info. Good to know.

I've regenerated the static pod manifests using latest version of kubeadm to run the phase on a different VM, compared the differences and made necessary changes to the one in my cluster.

NOTE: in my case - --listen-client-urls=https://127.0.0.1:2379,https://192.168.100.21:2379.

However, running kubeadm upgrade plan after etcd pod was started, I got the following error:

root@k8s-node-1:~# kubeadm upgrade plan
[preflight] Running pre-flight checks.
[upgrade] Making sure the cluster is healthy:
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
error syncing endpoints with etc: context deadline exceeded

Will follow #1471 to regenerate the certificates for etcd when I have time.

Thanks terry for the info, I had the same problem. Also following #1471

My original cluster originated from Kubernetes 1.8, and rebuild during the 1.11 upgrade because broke everything. My etcd also listens to localhost only:

apiVersion: v1
kind: Pod
metadata:
  annotations:
    scheduler.alpha.kubernetes.io/critical-pod: ""
  creationTimestamp: null
  labels:
    component: etcd
    tier: control-plane
  name: etcd
  namespace: kube-system
spec:
  containers:
  - command:
    - etcd
    - --advertise-client-urls=https://127.0.0.1:2379
    - --cert-file=/etc/kubernetes/pki/etcd/server.crt
    - --client-cert-auth=true
    - --data-dir=/var/lib/etcd
    - --initial-advertise-peer-urls=https://127.0.0.1:2380
    - --initial-cluster=phenomenal.edoburu.nl=https://127.0.0.1:2380
    - --key-file=/etc/kubernetes/pki/etcd/server.key
    - --listen-client-urls=https://127.0.0.1:2379
    - --listen-peer-urls=https://127.0.0.1:2380
    - --name=phenomenal.edoburu.nl
    - --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
    - --peer-client-cert-auth=true
    - --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
    - --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    - --snapshot-count=10000
    - --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    image: k8s.gcr.io/etcd:3.2.24
    imagePullPolicy: IfNotPresent
    livenessProbe:
      exec:
        command:
        - /bin/sh
        - -ec
        - ETCDCTL_API=3 etcdctl --endpoints=https://[127.0.0.1]:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt
          --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt --key=/etc/kubernetes/pki/etcd/healthcheck-client.key
          get foo
      failureThreshold: 8
      initialDelaySeconds: 15
      timeoutSeconds: 15
    name: etcd
    resources: {}
    volumeMounts:
    - mountPath: /var/lib/etcd
      name: etcd-data
    - mountPath: /etc/kubernetes/pki/etcd
      name: etcd-certs
  hostNetwork: true
  priorityClassName: system-cluster-critical
  volumes:
  - hostPath:
      path: /var/lib/etcd
      type: DirectoryOrCreate
    name: etcd-data
  - hostPath:
      path: /etc/kubernetes/pki/etcd
      type: DirectoryOrCreate
    name: etcd-certs
status: {}

+1
Same problem here, while trying to upgrade from 1.13.4 to 1.14.0:

# kubeadm upgrade plan
[preflight] Running pre-flight checks.
[upgrade] Making sure the cluster is healthy:
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
error syncing endpoints with etc: dial tcp 192.168.10.2:2379: connect: connection refused
#

fix should be up in 1.14.1
(to be released soon)

Update: I waited for kubeadm 1.14.1, it didn't actually fix the issue...

Luckily, simply by following the steps in #1471 mentioned by @mauilion I was able to leverage kubeadm phase (etcd-server and etcd) regenerate etcd TLS certificate to cover k8s node IP, reconfigure etcd with new listen-client-urls and start etcd, subsequently run the kubeadm upgrade plan.

The reason why kubeadm upgrade plan failed with the following error was because of the etcd server TLS certificate's SAN did not cover the k8s node's (on which etcd was running) IP, it simply failed to start.

[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
error syncing endpoints with etc: context deadline exceeded

The certificate SAN should look like below

        X509v3 extensions:
            X509v3 Key Usage: critical
                Digital Signature, Key Encipherment
            X509v3 Extended Key Usage:
                TLS Web Server Authentication, TLS Web Client Authentication
            X509v3 Subject Alternative Name:
                DNS:k8s-node-1, DNS:localhost, IP Address:192.168.100.21, IP Address:127.0.0.1, IP Address:0:0:0:0:0:0:0:1, IP Address:10.192.0.2

Update: I waited for kubeadm 1.14.1, it didn't actually fix the issue...

hm, it should have. the PR that @fabriziopandini created was merged and tested by at least a couple of people.

The certificate SAN should look like below

and your existing cert was missing 192.168.100.21?

Yes, existing etcd server certificate SAN was missing 192.168.100.21.

I may have forgotten to restore the static pod manifest for etcd (added node's private IP inside a VPC subnet to --listen-client-urls as a workaround when troubleshooting), this may be the reason why kubeadm upgrade plan still failed with v1.14.1.

Anyway, the problem is well solved.

Really appreciate your input and assistance, enjoyed the learning experience ;-)

I may have forgotten to restore the static pod manifest for etcd (added node's private IP inside a VPC subnet to --listen-client-urls as a workaround when troubleshooting), this may be the reason why kubeadm upgrade plan still failed with v1.14.1.

yes, that may be the cause.

Was this page helpful?
0 / 5 - 0 ratings