kubeadm upgrade apply failure with same kubernetes version

Created on 13 Aug 2018 · 19Comments · Source: kubernetes/kubeadm

Is this a BUG REPORT or FEATURE REQUEST?

BUG REPORT

Versions

kubeadm version (use kubeadm version): v1.11.2

Environment:

Kubernetes version (use kubectl version): v1.11.2
Cloud provider or hardware configuration: AWS but no cloud provider configured
OS (e.g. from /etc/os-release): Coreos 1800.6.0
Kernel (e.g. uname -a): Linux ip-172-31-35-161.eu-west-1.compute.internal 4.14.59-coreos-r2 #1 SMP Sat Aug 4 02:49:25 UTC 2018 x86_64 Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz GenuineIntel GNU/Linux
Others: CRI Runtime : Containerd v1.1.2

What happened?

Hi, I'm trying to upgrade my control plane with custom flags, just to test. (I'm working on an ansible solution to implement Kubeadm HA

To upgrade I'm using kubeadm upgrade diff --config kubeadm-config.yaml and then upgrade with kubeadm upgrade apply --config kubeadm-config.yaml

I'have only added a flag to kube-api-server for now, I can see it in the diff but kubeadm still tries to restart controller and scheduler and also show me some diff between manifest (basicly moving volumeMount up and down the file comparing to the file generated by kubeadm init) so I tries to restart the 3 component, it seems to work but It get stuck when trying to restart the scheduler, except the scheduler is running and it is waiting to acquire lease.

I think this issue might be related to https://github.com/kubernetes/kubernetes/issues/65071 and it might be because the hash is not changing because it is the same kubernetes version and no changes has been made to the pod.

Is this the proper way to modify cluster config on an already bootstrapped cluster ?

What you expected to happen?

I expect the control plane components to restart with the right flags/config added to the kubeadm config file

How to reproduce it (as minimally and precisely as possible)?

Bootstrap cluster with Kubeadm
Add a custom config flag to apiserver for example
run kubeadm upgrade --config
wait until timeout

Static pod: kube-scheduler-ip-172-31-35-161.eu-west-1.compute.internal hash: a00c35e56ebd0bdfcd77d53674a5d2a1                                                       
Static pod: kube-scheduler-ip-172-31-35-161.eu-west-1.compute.internal hash: a00c35e56ebd0bdfcd77d53674a5d2a1                                                       
Static pod: kube-scheduler-ip-172-31-35-161.eu-west-1.compute.internal hash: a00c35e56ebd0bdfcd77d53674a5d2a1                                                       
[upgrade/apply] FATAL: couldn't upgrade control plane. kubeadm has tried to recover everything into the earlier state. Errors faced: [timed out waiting for the condition]

Anything else we need to know?

When changing the version, for example I tried downgrading to 1.11.1 and the upgrade to 1.11.2, the upgrade is completed successfully.

areupgrades prioritimportant-soon

Source

ArchiFleKs

👍5

Most helpful comment

My workaround was the same (downgrade then upgrade).

ttarczynski on 17 Jan 2019

👍2

All 19 comments

Just adding the link without typo again, so that github can link the issues: https://github.com/kubernetes/kubernetes/issues/65071

nielsole on 25 Aug 2018

Same issue here with single master solution. Current kubernetes version: 1.11.2 . We are just trying to add flags and the kubeadm upgrade process keep being stucked on scheduler in the same way.

lzecca78 on 27 Aug 2018

Exact same issue. 1.11.2 single master. Trying to upgrade apply --config flags.yaml and getting stuck at api and then scheduler restarting.

jnance86 on 28 Aug 2018

Still an issue on 1.12.0. Docs mention that kubeadm upgrade apply is supposed to be idempotent.

NeilW on 3 Oct 2018

/assign @timothysc
/cc @rdodev

timothysc on 11 Oct 2018

@xiangpengzhao Thanks for the review. done.

bart0sh on 17 Oct 2018

Hey @ArchiFleKs (and others in this thread) I want to replicate this as close as possible. Would you folks mind sharing the config and the flags you attempted to change?

rdodev on 17 Oct 2018

@rdodev I'm hitting this bug when I do the following:

have a cluster running with configuration from /etc/kubeadm.yaml
then I change only some kubeletConfiguration option in /etc/kubeadm.yaml
- so there are no changes in kubernetesVersion
- nor in any cluster configuration
then I run kubeadm upgrade apply --config /etc/kubeadm.yaml
the expected result is:
- no changes in any files in /etc/kubernetes/manifests/
- all components versions unchanged
- only the kubelet configuration should change in the configmaps: kubeadm-config and kubelet-config-1.11
the actual result:
- upgrade stucks in the [upgrade/staticpods] Waiting for the kubelet to restart the component step

I think, even a simpler scenario should work (as kubeadm is idempotent):

kubeadm config view > /etc/kubeadm.yaml
kubeadm upgrade apply --config /etc/kubeadm.yaml

So it should succeed even without any changes in configuration at all.
But it ends with the error:

[upgrade/apply] FATAL: couldn't upgrade control plane. kubeadm has tried to recover everything into the earlier state. Errors faced: [timed out waiting for the condition]

ttarczynski on 17 Oct 2018

@ttarczynski great. Thanks for that. I think we found a different regression because of this :)

rdodev on 17 Oct 2018

👍1

@ttarczynski can you try if this PR solves the issue for you?

bart0sh on 18 Oct 2018

👍1

@bart0sh yes I can try it, but don't know how to get the binary with this patch applied.
Is it something I can get easily done if I've never built kubernetes/kubeadm from source?

ttarczynski on 18 Oct 2018

@bart0sh I've just managed to build the binary from PR #69886 and test it.
I've followed these steps:

have cluster in version v1.12.1:

# kubectl version --short
Client Version: v1.12.1
Server Version: v1.12.1

# rpm -qa | egrep '^kube'
kubectl-1.12.1-2.x86_64
kubelet-1.12.1-2.x86_64
kubeadm-1.12.1-2.x86_64

put the new version of kubeadm in /tmp/kubeadm.PR69886
Used this new kubeadm to write config to file
(kubeadm config view seems to be broken in v1.12.1 as mentioned in #1174)

/tmp/kubeadm.PR69886 config view > /etc/kubeadm.yaml

Run kubeadm v1.12.1 to make sure it still ends with an error:

[upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s
# kubeadm upgrade apply --config /etc/kubeadm.yaml 
...
[upgrade/staticpods] Waiting for the kubelet to restart the component
[upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s
Static pod: kube-scheduler-ksb-m1.grey hash: 2117f54c43e401f807b7c9744c2a63be
...
[upgrade/apply] FATAL: couldn't upgrade control plane. kubeadm has tried to recover everything into the earlier state. Errors faced: [timed out waiting for the condition]

Tested with kubeadm version from PR #69886

# /tmp/kubeadm.PR69886 upgrade apply --config /etc/kubeadm.yaml --force
...
[upgrade/staticpods] current and new manifests of kube-apiserver are equal, skipping upgrade
...
[upgrade/staticpods] current and new manifests of kube-scheduler are equal, skipping upgrade
...
[upgrade/successful] SUCCESS! Your cluster was upgraded to "v1.12.1". Enjoy!

So it seems to me that PR #69886 does fix this issue.

ttarczynski on 18 Oct 2018

👍2

@ArchiFleKs

I'have only added a flag to kube-api-server for now, I can see it in the diff but kubeadm still tries to restart controller and scheduler and also show me some diff between manifest (basicly moving volumeMount up and down the file comparing to the file generated by kubeadm init)

This should be fixed by this PR

bart0sh on 19 Oct 2018

👍1

Was this issue ever fixed in v1.11? Or only in v1.12?

ocofaigh on 11 Jan 2019

@ocofaigh I think the patch (PR #69886) is only available in v1.13 and not backported to older versions.

You can find this info in release notes:

for v1.13 -- https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.13.md#sig-cluster-lifecycle-1

Fixed 'kubeadm upgrade' infinite loop waiting for pod restart (#69886, @bart0sh)

ttarczynski on 14 Jan 2019

Hmm, is there any way I can work around this for v1.11? My scenario is I have a kubeadm v1.11.6 cluster set up, and I want to enable the PodSecurityPolicy admission plugin. So I should be able to:

kubeadm config view > kubeadm-config.yaml
Edit the kubeadm-config.yaml, locate the apiServerExtraArgs settings, and enable the PodSecurityPolicy admission plugin. EG:

apiServerExtraArgs:
  enable-admission-plugins: PodSecurityPolicy

kubeadm upgrade apply --config=kubeadm-config.yaml

However, I get stuck in a loop outputting:
[upgrade/staticpods] Waiting for the kubelet to restart the component

I guess I could downgrade my cluster version to v1.11.5, and then enable the PodSecurityPolicy admission plugin as part of the upgrade from v1.11.5 -> v1.11.6, but that seems extreme. Anyone got a better idea?

ocofaigh on 16 Jan 2019

My workaround was the same (downgrade then upgrade).

ttarczynski on 17 Jan 2019

👍2

Can anyone else confirm that this fix wasn't back ported to 1.12?