Kubeadm: Upgrading a 1.12 cluster thru 1.13 to 1.14 fails.

Created on 28 Mar 2019  路  10Comments  路  Source: kubernetes/kubeadm

BUG REPORT

Versions

kubeadm version (use kubeadm version): 1.14.0

Environment:

  • Kubernetes version (use kubectl version): 1.13.5
  • Cloud provider or hardware configuration: local
  • OS (e.g. from /etc/os-release): any
  • Kernel (e.g. uname -a): any
  • Others:

What happened?

in 1.12 we bound etcd to localhost on single master setups. We also minted certificates that included only hostname, localhost, and 127.0.0.1 ip san.

[certificates] etcd/server serving cert is signed for DNS names [kube-master localhost] and IPs [127.0.0.1 ::1]

in 1.13 we changed that behavior and started binding etcd to 127.0.0.1 and the node ip.
We also updated the cert generation to pick up the change.

You can upgrade a cluster from 1.12 to 1.13 with no issues as kubeadm plan will try to assess etcd on localhost.

When you try to upgrade the 1.13 cluster to 1.14 the upgrade fails cause in 1.14 kubeadm tries to assess etcd on the node ip. While it's a valid assumption that etcd would be bound to the node ip if the cluster were created using 1.13 this cluster was originally created using 1.12 and etcd is only bound to 127.0.0.1

What you expected to happen?

That we would either try to determine what address etcd is bound to or make the change in 1.13 to modify etcd configuration so that we don't strand 1.12 clusters.

How to reproduce it (as minimally and precisely as possible)?

bring up a 1.12 single master cluster
upgrade it to 1.13
Try to upgrade it to 1.14

Anything else we need to know?

You can work around this issue with the following:
using kubeadm for the version of kubernetes you are on you can:

  1. fetch the kubeadm.conf from the cluster.
 kubeadm config view > /etc/kubeadm.conf
  1. append the etcd config in kubeadm.conf to something like:
etcd:
  local:
    dataDir: /var/lib/etcd
    image: ""
    serverCertSANs:
    - "10.192.0.2"
    extraArgs:
      listen-client-urls: https://127.0.0.1:2379,https://10.192.0.2:2379 

where 10.192.0.2 is the node ip.

  1. remove the existing etcd server certs and regenerate them with a phase.
rm /etc/kubernetes/pki/etcd/server.*
  1. Mint new ones. You should see the new ip san in effect.
kubeadm init phase certs etcd-server --config /etc/kubeadm.conf
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [kube-master localhost] and IPs [10.192.0.2 127.0.0.1 ::1]
  1. use a phase to reconfigure etcd with the new listen-client-urls
kubeadm init phase etcd local --config /etc/kubeadm.conf
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"

you should now see etcd port 2379 bound to 127.0.0.1 and 10.192.0.2

ss -ln | grep 2379                                                                                                                                                       
tcp    LISTEN     0      128    127.0.0.1:2379                  *:*                  
tcp    LISTEN     0      128    10.192.0.2:2379                  *:*     
  1. Upload the kubeadm.conf to the cluster.
kubeadm config upload from-file --config /etc/kubeadm.conf
[uploadconfig] storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace

Now you can grab the new kubeadm and upgrade.

areupgrades kinbug lifecyclactive prioritimportant-soon

Most helpful comment

should be resolved in 1.14.1

All 10 comments

I also did:

  1. reload kubelet:
systemctl restart kubelet.service

so that it was all reloaded and ready

@mauilion thanks for the detailed report.
i think this was caught here: https://github.com/kubernetes/kubeadm/issues/1469

seems like we can close 1469 as this ticket outlines the problem better.

@kubernetes/sig-cluster-lifecycle

Experiencing the same issue as @neolit123 describes in https://github.com/kubernetes/kubeadm/issues/1469#issue-425890196.

Etcd manifests have not been manually altered, cluster was created before k8s 1.12.

/assign
/lifecycle active

should be resolved in 1.14.1

I just upgraded a cluster through 1.13.x and 1.14.4. Worked ok. I then tried to upgrade to 1.15.0 and it failed:
[root@evan3 ~]# kubeadm upgrade plan
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[preflight] Running pre-flight checks.
[upgrade] Making sure the cluster is healthy:
error syncing endpoints with etc: dial tcp xxx.xxx.xxx.xxx:2379: connect: connection refused

It was referencing the external address.

I do still see localhost stuff in etcd:
grep 127.0.0.1 /etc/kubernetes/manifests/etcd.yaml
- --advertise-client-urls=https://127.0.0.1:2379
- --initial-advertise-peer-urls=https://127.0.0.1:2380
- --initial-cluster=evan3=https://127.0.0.1:2380
- --listen-client-urls=https://127.0.0.1:2379
- --listen-peer-urls=https://127.0.0.1:2380
- ETCDCTL_API=3 etcdctl --endpoints=https://[127.0.0.1]:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt

Should I just update all references to 127.0.0.1 to the external address in /etc/kubernetes/manifests/etcd.yaml or is there more to it?

try patching it manually. this should not have happened as we had a special case to handle the etcd upgrade not being localhost.

Preciso subir a versao do eks do 1.12 pra 1.14
alguem pode me orientar o melhor como devo fazer

Was this page helpful?
0 / 5 - 0 ratings