kubeadm upgrade apply failure with same kubernetes version

Created on 13 Aug 2018  路  19Comments  路  Source: kubernetes/kubeadm

Is this a BUG REPORT or FEATURE REQUEST?

BUG REPORT

Versions

kubeadm version (use kubeadm version): v1.11.2

Environment:

  • Kubernetes version (use kubectl version): v1.11.2
  • Cloud provider or hardware configuration: AWS but no cloud provider configured
  • OS (e.g. from /etc/os-release): Coreos 1800.6.0
  • Kernel (e.g. uname -a): Linux ip-172-31-35-161.eu-west-1.compute.internal 4.14.59-coreos-r2 #1 SMP Sat Aug 4 02:49:25 UTC 2018 x86_64 Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz GenuineIntel GNU/Linux
  • Others: CRI Runtime : Containerd v1.1.2

What happened?

Hi, I'm trying to upgrade my control plane with custom flags, just to test. (I'm working on an ansible solution to implement Kubeadm HA

To upgrade I'm using kubeadm upgrade diff --config kubeadm-config.yaml and then upgrade with kubeadm upgrade apply --config kubeadm-config.yaml

I'have only added a flag to kube-api-server for now, I can see it in the diff but kubeadm still tries to restart controller and scheduler and also show me some diff between manifest (basicly moving volumeMount up and down the file comparing to the file generated by kubeadm init) so I tries to restart the 3 component, it seems to work but It get stuck when trying to restart the scheduler, except the scheduler is running and it is waiting to acquire lease.

I think this issue might be related to https://github.com/kubernetes/kubernetes/issues/65071 and it might be because the hash is not changing because it is the same kubernetes version and no changes has been made to the pod.

Is this the proper way to modify cluster config on an already bootstrapped cluster ?

What you expected to happen?

I expect the control plane components to restart with the right flags/config added to the kubeadm config file

How to reproduce it (as minimally and precisely as possible)?

  • Bootstrap cluster with Kubeadm
  • Add a custom config flag to apiserver for example
  • run kubeadm upgrade --config
  • wait until timeout
Static pod: kube-scheduler-ip-172-31-35-161.eu-west-1.compute.internal hash: a00c35e56ebd0bdfcd77d53674a5d2a1                                                       
Static pod: kube-scheduler-ip-172-31-35-161.eu-west-1.compute.internal hash: a00c35e56ebd0bdfcd77d53674a5d2a1                                                       
Static pod: kube-scheduler-ip-172-31-35-161.eu-west-1.compute.internal hash: a00c35e56ebd0bdfcd77d53674a5d2a1                                                       
[upgrade/apply] FATAL: couldn't upgrade control plane. kubeadm has tried to recover everything into the earlier state. Errors faced: [timed out waiting for the condition]

Anything else we need to know?

When changing the version, for example I tried downgrading to 1.11.1 and the upgrade to 1.11.2, the upgrade is completed successfully.

areupgrades prioritimportant-soon

Most helpful comment

My workaround was the same (downgrade then upgrade).

All 19 comments

Just adding the link without typo again, so that github can link the issues: https://github.com/kubernetes/kubernetes/issues/65071

Same issue here with single master solution. Current kubernetes version: 1.11.2 . We are just trying to add flags and the kubeadm upgrade process keep being stucked on scheduler in the same way.

Exact same issue. 1.11.2 single master. Trying to upgrade apply --config flags.yaml and getting stuck at api and then scheduler restarting.

Still an issue on 1.12.0. Docs mention that kubeadm upgrade apply is supposed to be idempotent.

/assign @timothysc
/cc @rdodev

@xiangpengzhao Thanks for the review. done.

Hey @ArchiFleKs (and others in this thread) I want to replicate this as close as possible. Would you folks mind sharing the config and the flags you attempted to change?

@rdodev I'm hitting this bug when I do the following:

  • have a cluster running with configuration from /etc/kubeadm.yaml
  • then I change only some kubeletConfiguration option in /etc/kubeadm.yaml

    • so there are no changes in kubernetesVersion

    • nor in any cluster configuration

  • then I run kubeadm upgrade apply --config /etc/kubeadm.yaml
  • the expected result is:

    • no changes in any files in /etc/kubernetes/manifests/

    • all components versions unchanged

    • only the kubelet configuration should change in the configmaps: kubeadm-config and kubelet-config-1.11

  • the actual result:

    • upgrade stucks in the [upgrade/staticpods] Waiting for the kubelet to restart the component step


I think, even a simpler scenario should work (as kubeadm is idempotent):

  1. kubeadm config view > /etc/kubeadm.yaml
  2. kubeadm upgrade apply --config /etc/kubeadm.yaml

So it should succeed even without any changes in configuration at all.
But it ends with the error:

[upgrade/apply] FATAL: couldn't upgrade control plane. kubeadm has tried to recover everything into the earlier state. Errors faced: [timed out waiting for the condition]

@ttarczynski great. Thanks for that. I think we found a different regression because of this :)

@ttarczynski can you try if this PR solves the issue for you?

@bart0sh yes I can try it, but don't know how to get the binary with this patch applied.
Is it something I can get easily done if I've never built kubernetes/kubeadm from source?

@bart0sh I've just managed to build the binary from PR #69886 and test it.
I've followed these steps:

  1. have cluster in version v1.12.1:
# kubectl version --short
Client Version: v1.12.1
Server Version: v1.12.1

# rpm -qa | egrep '^kube'
kubectl-1.12.1-2.x86_64
kubelet-1.12.1-2.x86_64
kubeadm-1.12.1-2.x86_64
  1. put the new version of kubeadm in /tmp/kubeadm.PR69886
  2. Used this new kubeadm to write config to file
    (kubeadm config view seems to be broken in v1.12.1 as mentioned in #1174)
/tmp/kubeadm.PR69886 config view > /etc/kubeadm.yaml
  1. Run kubeadm v1.12.1 to make sure it still ends with an error:
[upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s
# kubeadm upgrade apply --config /etc/kubeadm.yaml 
...
[upgrade/staticpods] Waiting for the kubelet to restart the component
[upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s
Static pod: kube-scheduler-ksb-m1.grey hash: 2117f54c43e401f807b7c9744c2a63be
...
[upgrade/apply] FATAL: couldn't upgrade control plane. kubeadm has tried to recover everything into the earlier state. Errors faced: [timed out waiting for the condition]
  1. Tested with kubeadm version from PR #69886
# /tmp/kubeadm.PR69886 upgrade apply --config /etc/kubeadm.yaml --force
...
[upgrade/staticpods] current and new manifests of kube-apiserver are equal, skipping upgrade
...
[upgrade/staticpods] current and new manifests of kube-scheduler are equal, skipping upgrade
...
[upgrade/successful] SUCCESS! Your cluster was upgraded to "v1.12.1". Enjoy!

So it seems to me that PR #69886 does fix this issue.

@ArchiFleKs

I'have only added a flag to kube-api-server for now, I can see it in the diff but kubeadm still tries to restart controller and scheduler and also show me some diff between manifest (basicly moving volumeMount up and down the file comparing to the file generated by kubeadm init)

This should be fixed by this PR

Was this issue ever fixed in v1.11? Or only in v1.12?

@ocofaigh I think the patch (PR #69886) is only available in v1.13 and not backported to older versions.

You can find this info in release notes:

Fixed 'kubeadm upgrade' infinite loop waiting for pod restart (#69886, @bart0sh)

Hmm, is there any way I can work around this for v1.11? My scenario is I have a kubeadm v1.11.6 cluster set up, and I want to enable the PodSecurityPolicy admission plugin. So I should be able to:

  1. kubeadm config view > kubeadm-config.yaml
  2. Edit the kubeadm-config.yaml, locate the apiServerExtraArgs settings, and enable the PodSecurityPolicy admission plugin. EG:
apiServerExtraArgs:
  enable-admission-plugins: PodSecurityPolicy
  1. kubeadm upgrade apply --config=kubeadm-config.yaml

However, I get stuck in a loop outputting:
[upgrade/staticpods] Waiting for the kubelet to restart the component

I guess I could downgrade my cluster version to v1.11.5, and then enable the PodSecurityPolicy admission plugin as part of the upgrade from v1.11.5 -> v1.11.6, but that seems extreme. Anyone got a better idea?

My workaround was the same (downgrade then upgrade).

Can anyone else confirm that this fix wasn't back ported to 1.12?

@brysonshepherd it wasn't.

Was this page helpful?
0 / 5 - 0 ratings