Kubeadm: `kubeadm upgrade plan` and `kubeadm upgrade apply v1.10.1` both hang

Created on 16 Apr 2018 · 7Comments · Source: kubernetes/kubeadm

Is this a BUG REPORT or FEATURE REQUEST?

/kind bug

Choose one: BUG REPORT or FEATURE REQUEST

Versions

kubeadm version (use kubeadm version): 1.10.1

kubeadm version: &version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.1", GitCommit:"d4ab47518836c750f9949b9e0d387f20fb92260b", GitTreeState:"clean", BuildDate:"2018-04-12T14:14:26Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}

Environment:

Kubernetes version (use kubectl version): 1.10.0
Cloud provider or hardware configuration: Bare metal, x86_64 (Xeon D-1518)
OS (e.g. from /etc/os-release): Debian Testing
Kernel (e.g. uname -a): 4.15.0-2 (Debian)
Others:

What happened?

kubeadm upgrade plan hangs after discovering that the latest version is v1.10.1 (left running 10min, makes no further progress). Output before hanging is:

root@prod-01:~# kubeadm upgrade plan
[preflight] Running pre-flight checks.
[upgrade] Making sure the cluster is healthy:
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[upgrade] Fetching available versions to upgrade to
[upgrade/versions] Cluster version: v1.10.0
[upgrade/versions] kubeadm version: v1.10.1
[upgrade/versions] Latest stable version: v1.10.1

Similarly, kubeadm upgrade apply v1.10.1 hangs before changing any manifests (control plane pods don't restart at all). Output before hanging:

root@prod-01:~# kubeadm upgrade apply v1.10.1
[preflight] Running pre-flight checks.
[upgrade] Making sure the cluster is healthy:
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[upgrade/version] You have chosen to change the cluster version to "v1.10.1"
[upgrade/versions] Cluster version: v1.10.0 , etcd 3.1.12
[upgrade/versions] kubeadm version: v1.10.1
[upgrade/confirm] Are you sure you want to proceed with the upgrade? [y/N]: y
[upgrade/prepull] Will prepull images for components [kube-apiserver kube-controller-manager kube-scheduler]
[upgrade/apply] Upgrading your Static Pod-hosted control plane to version "v1.10.1"...

What you expected to happen?

kubeadm should not hang indefinitely before doing things.

How to reproduce it (as minimally and precisely as possible)?

Initialize a 1.10.0 cluster using kubeadm 1.10.0. Upgrade to kubeadm 1.10.1, and attempt to plan/execute an upgrade to 1.10.1.

Anything else we need to know?

At least two other people seem to have seen the exact same symptoms that I did:

https://linuxacademy.com/community/posts/show/topic/26304-kubeadm-upgrade-plan-hangs
Someone on the kubeadm slack channel says "When I update my cluster with kubeadm from 1.10.0 to 1.10.1, kubeadm upgrade plan freeze"

kinbug sicluster-lifecycle

Source

danderson

👍3 ❤2

All 7 comments

I should also add: Kubernetes itself is working fine (control plane up, responsive, scheduling pods...). I also tried rebooting this machine in case there was any wedged state anywhere, but it didn't help.

@kubernetes/sig-cluster-lifecycle-bugs

danderson on 16 Apr 2018

@danderson: Reiterating the mentions to trigger a notification:
@kubernetes/sig-cluster-lifecycle-bugs

In response to this:

I should also add: Kubernetes itself is working fine (control plane up, responsive, scheduling pods...). I also tried rebooting this machine in case there was any wedged state anywhere, but it didn't help.

@kubernetes/sig-cluster-lifecycle-bugs

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot on 16 Apr 2018

@danderson Thank you for reporting, we are aware of 2 distinct upgrade bugs are working to get a fix in 1.10.2. I'm going to close this issue as a dupe.

/cc @liztio @detiber @stealthybox

timothysc on 16 Apr 2018

👍1

@timothysc this is currently undocumented in the other issues, but I was seeing this is morning with existing TLS clusters being unable to upgrade.

The root of this symptom is that the Etcd client used for the pre-upgrade check doesn't support TLS.

https://github.com/kubernetes/kubernetes/pull/62655 does address this case

stealthybox on 16 Apr 2018

@danderson Thank you for reporting, we are aware of 2 distinct upgrade bugs are working to get a fix in 1.10.2. I'm going to close this issue as a dupe.

I was unable to find the issues / PRs for these 2 upgrade bugs so I could track them, can someone reference them?