Kubeadm: Upgrading a 1.12 cluster thru 1.13 to 1.14 fails.

Created on 28 Mar 2019 · 10Comments · Source: kubernetes/kubeadm

BUG REPORT

Versions

kubeadm version (use kubeadm version): 1.14.0

Environment:

Kubernetes version (use kubectl version): 1.13.5
Cloud provider or hardware configuration: local
OS (e.g. from /etc/os-release): any
Kernel (e.g. uname -a): any
Others:

What happened?

in 1.12 we bound etcd to localhost on single master setups. We also minted certificates that included only hostname, localhost, and 127.0.0.1 ip san.

[certificates] etcd/server serving cert is signed for DNS names [kube-master localhost] and IPs [127.0.0.1 ::1]

in 1.13 we changed that behavior and started binding etcd to 127.0.0.1 and the node ip.
We also updated the cert generation to pick up the change.

You can upgrade a cluster from 1.12 to 1.13 with no issues as kubeadm plan will try to assess etcd on localhost.

When you try to upgrade the 1.13 cluster to 1.14 the upgrade fails cause in 1.14 kubeadm tries to assess etcd on the node ip. While it's a valid assumption that etcd would be bound to the node ip if the cluster were created using 1.13 this cluster was originally created using 1.12 and etcd is only bound to 127.0.0.1

What you expected to happen?

That we would either try to determine what address etcd is bound to or make the change in 1.13 to modify etcd configuration so that we don't strand 1.12 clusters.

How to reproduce it (as minimally and precisely as possible)?

bring up a 1.12 single master cluster
upgrade it to 1.13
Try to upgrade it to 1.14

Anything else we need to know?

You can work around this issue with the following:
using kubeadm for the version of kubernetes you are on you can:

fetch the kubeadm.conf from the cluster.

 kubeadm config view > /etc/kubeadm.conf

append the etcd config in kubeadm.conf to something like:

etcd:
  local:
    dataDir: /var/lib/etcd
    image: ""
    serverCertSANs:
    - "10.192.0.2"
    extraArgs:
      listen-client-urls: https://127.0.0.1:2379,https://10.192.0.2:2379

where 10.192.0.2 is the node ip.

remove the existing etcd server certs and regenerate them with a phase.

rm /etc/kubernetes/pki/etcd/server.*

Mint new ones. You should see the new ip san in effect.

kubeadm init phase certs etcd-server --config /etc/kubeadm.conf
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [kube-master localhost] and IPs [10.192.0.2 127.0.0.1 ::1]

use a phase to reconfigure etcd with the new listen-client-urls

kubeadm init phase etcd local --config /etc/kubeadm.conf
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"

you should now see etcd port 2379 bound to 127.0.0.1 and 10.192.0.2

ss -ln | grep 2379                                                                                                                                                       
tcp    LISTEN     0      128    127.0.0.1:2379                  *:*                  
tcp    LISTEN     0      128    10.192.0.2:2379                  *:*

Upload the kubeadm.conf to the cluster.

kubeadm config upload from-file --config /etc/kubeadm.conf
[uploadconfig] storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace

Now you can grab the new kubeadm and upgrade.

areupgrades kinbug lifecyclactive prioritimportant-soon

Source

mauilion

👍12 ❤3

Most helpful comment

should be resolved in 1.14.1

neolit123 on 11 Apr 2019

❤3

All 10 comments

I also did:

reload kubelet:

systemctl restart kubelet.service

so that it was all reloaded and ready

davidkarlsen on 28 Mar 2019

@mauilion thanks for the detailed report.
i think this was caught here: https://github.com/kubernetes/kubeadm/issues/1469

seems like we can close 1469 as this ticket outlines the problem better.

@kubernetes/sig-cluster-lifecycle

neolit123 on 28 Mar 2019

https://github.com/kubernetes/kubeadm/issues/1469#issuecomment-477541650

neolit123 on 28 Mar 2019

Experiencing the same issue as @neolit123 describes in https://github.com/kubernetes/kubeadm/issues/1469#issue-425890196.

Etcd manifests have not been manually altered, cluster was created before k8s 1.12.

proskehy on 29 Mar 2019

/assign
/lifecycle active

fabriziopandini on 30 Mar 2019

should be resolved in 1.14.1

neolit123 on 11 Apr 2019

❤3

I just upgraded a cluster through 1.13.x and 1.14.4. Worked ok. I then tried to upgrade to 1.15.0 and it failed:
[root@evan3 ~]# kubeadm upgrade plan
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[preflight] Running pre-flight checks.
[upgrade] Making sure the cluster is healthy:
error syncing endpoints with etc: dial tcp xxx.xxx.xxx.xxx:2379: connect: connection refused

It was referencing the external address.

I do still see localhost stuff in etcd:
grep 127.0.0.1 /etc/kubernetes/manifests/etcd.yaml
- --advertise-client-urls=https://127.0.0.1:2379
- --initial-advertise-peer-urls=https://127.0.0.1:2380
- --initial-cluster=evan3=https://127.0.0.1:2380
- --listen-client-urls=https://127.0.0.1:2379
- --listen-peer-urls=https://127.0.0.1:2380
- ETCDCTL_API=3 etcdctl --endpoints=https://[127.0.0.1]:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt

kfox1111 on 15 Jul 2019

Should I just update all references to 127.0.0.1 to the external address in /etc/kubernetes/manifests/etcd.yaml or is there more to it?

kfox1111 on 15 Jul 2019

try patching it manually. this should not have happened as we had a special case to handle the etcd upgrade not being localhost.

neolit123 on 15 Jul 2019

Preciso subir a versao do eks do 1.12 pra 1.14
alguem pode me orientar o melhor como devo fazer

baduba on 28 May 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

"[bootstraptoken]" and "[bootstrap-token]" at kubeadm init

atoato88 · 4Comments

kubeadm join does not merge flags and configs

chuckha · 3Comments

kubeadm may log bootstrap tokens before attempting to delete them

mlevesquedion · 3Comments

kubeadm init refuses to use link-local unicast IPs as --apiserver-advertise-address

danderson · 3Comments

How to debug hanging "Created API client, waiting for the control plane to become ready"

andersla · 4Comments