Kubeadm: When running a single control plane node cluster, `kubeadm upgrade` hangs after printing `[addons] Applied essential addon: CoreDNS`

Created on 19 Feb 2020 · 36Comments · Source: kubernetes/kubeadm

Is this a BUG REPORT or FEATURE REQUEST?

BUG REPORT

Versions

kubeadm version (use kubeadm version): 1.15.3-00 and 1.16.7-00

Environment:

Kubernetes version (use kubectl version): Already upgraded past what it was, sorry
Cloud provider or hardware configuration:
OS (e.g. from /etc/os-release): Ubuntu 16.04.6 LTS
Kernel (e.g. uname -a): Linux debug-6 4.4.0-169-generic #198-Ubuntu SMP Tue Nov 12 10:38:00 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Others:

What happened?

Upgraded from 1.15 to 1.16 and 1.16 to 1.17 as per instructions in https://v1-16.docs.kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/ and https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/.

The upgrade hung during step (4) of https://v1-16.docs.kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/#upgrading-control-plane-nodes. The last thing kubeadm upgrade printing[addons] Applied essential addon: CoreDNS`. I let it sit there for 30 minutes and then skipped ahead to step (6) and uncordoned the control plane node. After uncordoning the node, it completed within a few minutes.

What you expected to happen?

I expected it to complete within half an hour.

How to reproduce it (as minimally and precisely as possible)?

I suspect if you run a single control plane cluster and follow the control plane upgrade steps, it'll repro. It looks like it will wait until coredns is ready. This will never happen because the single node in the cluster is drained, as per step (2) in https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/#upgrade-the-first-control-plane-node. So it waits until the timeout, which seems to initially be 5 seconds with exponential backoff up to 10 times or so. I think in aggregate that's over an hour, which would explain why it seemed to me to hang for half an hour.

Anything else we need to know?

We're running a single control plane cluster as per https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/, which seems to be the primary culprit here.

help wanted kinbug prioritimportant-longterm

Source

willchan

👍6

Most helpful comment

I am also getting the error, When running a single control plane cluster, kubeadm upgrade hangs after printing [addons] Applied essential addon: CoreDNS.I am trying to upgrade from v1.17.2 to v1.17.4.

anjibabueluri on 12 Apr 2020

👍2

All 36 comments

I suspect if you run a single control plane cluster and follow the control plane upgrade steps, it'll repro. It looks like it will wait until coredns is ready. This will never happen because the single node in the cluster is drained, as per step (2) in https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/#upgrade-the-first-control-plane-node. So it waits until the timeout, which seems to initially be 5 seconds with exponential backoff up to 10 times or so. I think in aggregate that's over an hour, which would explain why it seemed to me to hang for half an hour.

i ended up calculating this as 2555 seconds (~43 minutes) for the 10th step.
i think we should ideally reduce this to timeout to something like 30 seconds max.

cc @rajansandeep

xref:
https://github.com/kubernetes/kubernetes/blob/3aa59f7f3077642592dc8a864fcef8ba98699894/cmd/kubeadm/app/phases/upgrade/postupgrade.go#L140
https://github.com/kubernetes/kubernetes/blob/dde6e8e7465468c32642659cb708a5cc922add64/cmd/kubeadm/app/util/apiclient/wait.go#L255-L260

/kind bug
/priority important-longterm

neolit123 on 20 Feb 2020

I think reducing the timeout would improve the UX since it wouldn't hang for so long. That said, IIUC, it would still get reported as an error, which might be confusing to users.

willchan on 20 Feb 2020

I suspect if you run a single control plane cluster and follow the control plane upgrade steps, it'll repro. It looks like it will wait until coredns is ready.

The part of code you're referring to only triggers when the DNS server is changed.
I'm not sure this is the reason for the upgrade hanging.

rajansandeep on 20 Feb 2020

Upgraded from 1.15 to 1.16 and 1.16 to 1.17 as per instructions in https://v1-16.docs.kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/ and https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/.

@willchan just to be clear, you were able to upgrade from 1.15 to 1.16 successfully, but faced this issue when upgrading from 1.16 to 1.17?

rajansandeep on 20 Feb 2020

Maybe I'm misunderstanding the code, but AFAICT, it always waits for the expected DNS add-on. And then it always tries to delete the other add-on, but ignores the error if it's not found.

willchan on 20 Feb 2020

Upgraded from 1.15 to 1.16 and 1.16 to 1.17 as per instructions in https://v1-16.docs.kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/ and https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/.

@willchan just to be clear, you were able to upgrade from 1.15 to 1.16 successfully, but faced this issue when upgrading from 1.16 to 1.17?

I hit the same issue on both upgrades, from 1.15 to 1.16 and 1.16 to 1.17.

willchan on 20 Feb 2020

I hit the same issue on both upgrades, from 1.15 to 1.16 and 1.16 to 1.17.

Okay.
Before upgrading, is the DNS server kube-dns or CoreDNS?

rajansandeep on 20 Feb 2020

I hit the same issue on both upgrades, from 1.15 to 1.16 and 1.16 to 1.17.

Okay.
Before upgrading, is the DNS server kube-dns or CoreDNS?

CoreDNS. I did not explicitly configure any feature flag in my setup, and I believe CoreDNS hit GA a number of releases ago, so any switch would have happened awhile back.

willchan on 20 Feb 2020

/assign

SataQiu on 23 Feb 2020

Maybe we should check if there are some schedulable nodes to run DNS deployment.
I have sent a PR kubernetes/kubernetes#88434 to try to solve this.

SataQiu on 23 Feb 2020

I got the same issue upgrading from 1.16.8 to 1.17.4. It is still running so I cannot send any further info at the moment.

krjw on 26 Mar 2020

Got same issue on 1.17.3 -> 1.18.0

blacksailer on 31 Mar 2020

Got same issue on 1.17.3 -> 1.18.0

my understanding was that this was fixed in 1.18.
can you show the output of the command with --v=5?

neolit123 on 31 Mar 2020

Surely, but it hangs, and give postupgrade error, so I just updated kubelet, I think that's it?

[root@master user]#    kubeadm upgrade apply v1.18.0 -v=5
I0331 11:23:14.017619   19989 apply.go:112] [upgrade/apply] verifying health of cluster
I0331 11:23:14.018024   19989 apply.go:113] [upgrade/apply] retrieving configuration from cluster
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
I0331 11:23:14.048696   19989 common.go:123] running preflight checks
[preflight] Running pre-flight checks.
I0331 11:23:14.048838   19989 preflight.go:79] validating if there are any unsupported CoreDNS plugins in the Corefile
I0331 11:23:14.056960   19989 preflight.go:105] validating if migration can be done for the current CoreDNS release.
[upgrade] Running cluster health checks
I0331 11:23:14.061700   19989 health.go:158] Creating Job "upgrade-health-check" in the namespace "kube-system"
I0331 11:23:14.071587   19989 health.go:188] Job "upgrade-health-check" in the namespace "kube-system" is not yet complete, retrying
I0331 11:23:15.074000   19989 health.go:188] Job "upgrade-health-check" in the namespace "kube-system" is not yet complete, retrying
I0331 11:23:16.074639   19989 health.go:188] Job "upgrade-health-check" in the namespace "kube-system" is not yet complete, retrying
I0331 11:23:17.074024   19989 health.go:195] Job "upgrade-health-check" in the namespace "kube-system" completed
I0331 11:23:17.074049   19989 health.go:201] Deleting Job "upgrade-health-check" in the namespace "kube-system"
I0331 11:23:17.080363   19989 apply.go:120] [upgrade/apply] validating requested and actual version
I0331 11:23:17.080407   19989 apply.go:136] [upgrade/version] enforcing version skew policies
[upgrade/version] You have chosen to change the cluster version to "v1.18.0"
[upgrade/versions] Cluster version: v1.18.0
[upgrade/versions] kubeadm version: v1.18.0
[upgrade/confirm] Are you sure you want to proceed with the upgrade? [y/N]: y
I0331 11:23:19.015567   19989 apply.go:152] [upgrade/apply] creating prepuller
[upgrade/prepull] Will prepull images for components [kube-apiserver kube-controller-manager kube-scheduler etcd]
[upgrade/prepull] Prepulling image for component etcd.
[upgrade/prepull] Prepulling image for component kube-scheduler.
[upgrade/prepull] Prepulling image for component kube-apiserver.
[upgrade/prepull] Prepulling image for component kube-controller-manager.
[apiclient] Found 0 Pods for label selector k8s-app=upgrade-prepull-etcd
[apiclient] Found 1 Pods for label selector k8s-app=upgrade-prepull-kube-scheduler
[apiclient] Found 1 Pods for label selector k8s-app=upgrade-prepull-kube-controller-manager
[apiclient] Found 1 Pods for label selector k8s-app=upgrade-prepull-kube-apiserver
[apiclient] Found 1 Pods for label selector k8s-app=upgrade-prepull-etcd
[upgrade/prepull] Prepulled image for component etcd.
[upgrade/prepull] Prepulled image for component kube-scheduler.
[upgrade/prepull] Prepulled image for component kube-apiserver.
[upgrade/prepull] Prepulled image for component kube-controller-manager.
[upgrade/prepull] Successfully prepulled the images for all the control plane components
I0331 11:23:21.055361   19989 apply.go:163] [upgrade/apply] performing upgrade
[upgrade/apply] Upgrading your Static Pod-hosted control plane to version "v1.18.0"...
I0331 11:23:21.233716   19989 request.go:557] Throttling request took 177.321258ms, request: GET:https://10.10.10.10:6443/api/v1/namespaces/kube-system/pods/kube-apiserver-master?timeout=10s
Static pod: kube-apiserver-master hash: 0b6c88d27d3c56d665d5e4043d1d8f7b
I0331 11:23:21.433726   19989 request.go:557] Throttling request took 192.492308ms, request: GET:https://10.10.10.10:6443/api/v1/namespaces/kube-system/pods/kube-controller-manager-master?timeout=10s
Static pod: kube-controller-manager-master hash: 334d1abebb5d44226b34ea18a8940065
I0331 11:23:21.633675   19989 request.go:557] Throttling request took 197.590337ms, request: GET:https://10.10.10.10:6443/api/v1/namespaces/kube-system/pods/kube-scheduler-master?timeout=10s
Static pod: kube-scheduler-master hash: 68835a2012b9716a7c018f4247ae940d
I0331 11:23:21.635730   19989 etcd.go:178] retrieving etcd endpoints from "kubeadm.kubernetes.io/etcd.advertise-client-urls" annotation in etcd Pods
I0331 11:23:21.833679   19989 request.go:557] Throttling request took 197.879295ms, request: GET:https://10.10.10.10:6443/api/v1/namespaces/kube-system/pods?labelSelector=component%3Detcd%2Ctier%3Dcontrol-plane
I0331 11:23:21.836670   19989 etcd.go:102] etcd endpoints read from pods: https://10.10.10.10:2379
I0331 11:23:21.845679   19989 etcd.go:250] etcd endpoints read from etcd: https://10.10.10.10:2379
I0331 11:23:21.845723   19989 etcd.go:120] update etcd endpoints: https://10.10.10.10:2379
[upgrade/etcd] Upgrading to TLS for etcd
[upgrade/etcd] Non fatal issue encountered during upgrade: the desired etcd version for this Kubernetes version "v1.18.0" is "3.4.3-0", but the current etcd version is "3.4.3". Won't downgrade etcd, instead just continue
[upgrade/staticpods] Writing new Static Pod manifests to "/etc/kubernetes/tmp/kubeadm-upgraded-manifests566807007"
I0331 11:23:22.281480   19989 manifests.go:41] [control-plane] creating static Pod files
I0331 11:23:22.281493   19989 manifests.go:91] [control-plane] getting StaticPodSpecs
W0331 11:23:22.281726   19989 manifests.go:225] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node,RBAC"
I0331 11:23:22.281964   19989 manifests.go:104] [control-plane] adding volume "ca-certs" for component "kube-apiserver"
I0331 11:23:22.281974   19989 manifests.go:104] [control-plane] adding volume "etc-pki" for component "kube-apiserver"
I0331 11:23:22.281978   19989 manifests.go:104] [control-plane] adding volume "k8s-certs" for component "kube-apiserver"
I0331 11:23:22.287803   19989 manifests.go:121] [control-plane] wrote static Pod manifest for component "kube-apiserver" to "/etc/kubernetes/tmp/kubeadm-upgraded-manifests566807007/kube-apiserver.yaml"
I0331 11:23:22.287836   19989 manifests.go:104] [control-plane] adding volume "ca-certs" for component "kube-controller-manager"
I0331 11:23:22.287850   19989 manifests.go:104] [control-plane] adding volume "etc-pki" for component "kube-controller-manager"
I0331 11:23:22.287857   19989 manifests.go:104] [control-plane] adding volume "flexvolume-dir" for component "kube-controller-manager"
I0331 11:23:22.287867   19989 manifests.go:104] [control-plane] adding volume "k8s-certs" for component "kube-controller-manager"
I0331 11:23:22.287874   19989 manifests.go:104] [control-plane] adding volume "kubeconfig" for component "kube-controller-manager"
I0331 11:23:22.288627   19989 manifests.go:121] [control-plane] wrote static Pod manifest for component "kube-controller-manager" to "/etc/kubernetes/tmp/kubeadm-upgraded-manifests566807007/kube-controller-manager.yaml"
I0331 11:23:22.288657   19989 manifests.go:104] [control-plane] adding volume "kubeconfig" for component "kube-scheduler"
I0331 11:23:22.289179   19989 manifests.go:121] [control-plane] wrote static Pod manifest for component "kube-scheduler" to "/etc/kubernetes/tmp/kubeadm-upgraded-manifests566807007/kube-scheduler.yaml"
[upgrade/staticpods] Preparing for "kube-apiserver" upgrade
[upgrade/staticpods] Current and new manifests of kube-apiserver are equal, skipping upgrade
[upgrade/staticpods] Preparing for "kube-controller-manager" upgrade
[upgrade/staticpods] Current and new manifests of kube-controller-manager are equal, skipping upgrade
[upgrade/staticpods] Preparing for "kube-scheduler" upgrade
[upgrade/staticpods] Current and new manifests of kube-scheduler are equal, skipping upgrade
I0331 11:23:22.383676   19989 apply.go:169] [upgrade/postupgrade] upgrading RBAC rules and addons
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.18" in namespace kube-system with the configuration for the kubelets in the cluster
I0331 11:23:22.633673   19989 request.go:557] Throttling request took 188.205754ms, request: POST:https://10.10.10.10:6443/api/v1/namespaces/kube-system/configmaps?timeout=10s
I0331 11:23:22.833676   19989 request.go:557] Throttling request took 196.293672ms, request: PUT:https://10.10.10.10:6443/api/v1/namespaces/kube-system/configmaps/kubelet-config-1.18?timeout=10s
[kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.18" ConfigMap in the kube-system namespace
I0331 11:23:23.033701   19989 request.go:557] Throttling request took 189.200673ms, request: GET:https://10.10.10.10:6443/api/v1/namespaces/kube-system/configmaps/kubelet-config-1.18?timeout=10s
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
I0331 11:23:23.035822   19989 patchnode.go:30] [patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "master" as an annotation
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
I0331 11:23:23.557343   19989 clusterinfo.go:79] creating the RBAC rules for exposing the cluster-info ConfigMap in the kube-public namespace
I0331 11:23:23.635821   19989 request.go:557] Throttling request took 74.68415ms, request: PUT:https://10.10.10.10:6443/apis/rbac.authorization.k8s.io/v1/namespaces/kube-public/roles/kubeadm:bootstrap-signer-clusterinfo?timeout=10s
I0331 11:23:23.835820   19989 request.go:557] Throttling request took 197.486809ms, request: POST:https://10.10.10.10:6443/apis/rbac.authorization.k8s.io/v1/namespaces/kube-public/rolebindings?timeout=10s
I0331 11:23:24.035835   19989 request.go:557] Throttling request took 196.634732ms, request: PUT:https://10.10.10.10:6443/apis/rbac.authorization.k8s.io/v1/namespaces/kube-public/rolebindings/kubeadm:bootstrap-signer-clusterinfo?timeout=10s
I0331 11:23:24.233695   19989 request.go:557] Throttling request took 185.629269ms, request: PUT:https://10.10.10.10:6443/api/v1/namespaces/kube-system/configmaps/coredns?timeout=10s
I0331 11:23:24.435815   19989 request.go:557] Throttling request took 196.663589ms, request: PUT:https://10.10.10.10:6443/apis/rbac.authorization.k8s.io/v1/clusterroles/system:coredns?timeout=10s
I0331 11:23:24.635816   19989 request.go:557] Throttling request took 197.817314ms, request: POST:https://10.10.10.10:6443/apis/rbac.authorization.k8s.io/v1/clusterrolebindings?timeout=10s
I0331 11:23:24.835844   19989 request.go:557] Throttling request took 196.604798ms, request: PUT:https://10.10.10.10:6443/apis/rbac.authorization.k8s.io/v1/clusterrolebindings/system:coredns?timeout=10s
[addons] Applied essential addon: CoreDNS
I0331 11:23:25.033673   19989 request.go:557] Throttling request took 175.122741ms, request: GET:https://10.10.10.10:6443/api/v1/nodes?fieldSelector=spec.unschedulable%3Dfalse


[addons] Applied essential addon: kube-proxy
timed out waiting for the condition
[upgrade/postupgrade] FATAL post-upgrade error
k8s.io/kubernetes/cmd/kubeadm/app/cmd/upgrade.runApply
    /workspace/anago-v1.18.0-rc.1.21+8be33caaf953ac/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/upgrade/apply.go:171
k8s.io/kubernetes/cmd/kubeadm/app/cmd/upgrade.NewCmdApply.func1
    /workspace/anago-v1.18.0-rc.1.21+8be33caaf953ac/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/upgrade/apply.go:79
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).execute
    /workspace/anago-v1.18.0-rc.1.21+8be33caaf953ac/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:826
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).ExecuteC
    /workspace/anago-v1.18.0-rc.1.21+8be33caaf953ac/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:914
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).Execute
    /workspace/anago-v1.18.0-rc.1.21+8be33caaf953ac/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:864
k8s.io/kubernetes/cmd/kubeadm/app.Run
    /workspace/anago-v1.18.0-rc.1.21+8be33caaf953ac/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/kubeadm.go:50
main.main
    _output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/kubeadm.go:25
runtime.main
    /usr/local/go/src/runtime/proc.go:203
runtime.goexit
    /usr/local/go/src/runtime/asm_amd64.s:1357

blacksailer on 31 Mar 2020

timed out waiting for the condition

seems like a temporary timeout, but the error is not very descriptive.
try calling kubeadm upgrade apply v1.18.0 -v=10 again and see if it passes.

Surely, but it hangs, and give postupgrade error, so I just updated kubelet, I think that's it?

note that kubelet upgrades should be applied only after kubeadm upgrade... has passed on all nodes.

neolit123 on 31 Mar 2020

Drained all nodes and upgrade each individually was successful

blacksailer on 1 Apr 2020

👍2

Can somebody sum up what one need to do when you get in this situation. Trying to upgrade 1.17.4 -> 1.18.1 and it hangs at this step.

artisticcheese on 9 Apr 2020

@artisticcheese this should have been fixed in 18.1.
can you provide logs using at least -v=5?

one option is to just delete the coredns deployment and reapply it using the 18.1 kubeadm binary using kubeadm init phase addons dns --config myconfig.yaml, which should make the upgrade apply then pass, in theory..

neolit123 on 9 Apr 2020

Here is log

I0409 13:23:29.767850   43075 apply.go:112] [upgrade/apply] verifying health of cluster
I0409 13:23:29.767931   43075 apply.go:113] [upgrade/apply] retrieving configuration from cluster
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
I0409 13:23:29.808811   43075 common.go:123] running preflight checks
[preflight] Running pre-flight checks.
I0409 13:23:29.809039   43075 preflight.go:79] validating if there are any unsupported CoreDNS plugins in the Corefile
I0409 13:23:29.817934   43075 preflight.go:105] validating if migration can be done for the current CoreDNS release.
[upgrade] Running cluster health checks
I0409 13:23:29.823619   43075 health.go:158] Creating Job "upgrade-health-check" in the namespace "kube-system"
I0409 13:23:29.838292   43075 health.go:188] Job "upgrade-health-check" in the namespace "kube-system" is not yet complete, retrying
I0409 13:23:30.841703   43075 health.go:188] Job "upgrade-health-check" in the namespace "kube-system" is not yet complete, retrying
I0409 13:23:31.841373   43075 health.go:188] Job "upgrade-health-check" in the namespace "kube-system" is not yet complete, retrying
I0409 13:23:32.841843   43075 health.go:188] Job "upgrade-health-check" in the namespace "kube-system" is not yet complete, retrying
I0409 13:23:33.841561   43075 health.go:188] Job "upgrade-health-check" in the namespace "kube-system" is not yet complete, retrying
I0409 13:23:34.841718   43075 health.go:188] Job "upgrade-health-check" in the namespace "kube-system" is not yet complete, retrying
I0409 13:23:35.841798   43075 health.go:188] Job "upgrade-health-check" in the namespace "kube-system" is not yet complete, retrying
I0409 13:23:36.841901   43075 health.go:188] Job "upgrade-health-check" in the namespace "kube-system" is not yet complete, retrying
I0409 13:23:37.842199   43075 health.go:188] Job "upgrade-health-check" in the namespace "kube-system" is not yet complete, retrying
I0409 13:23:38.842155   43075 health.go:188] Job "upgrade-health-check" in the namespace "kube-system" is not yet complete, retrying
I0409 13:23:39.841545   43075 health.go:188] Job "upgrade-health-check" in the namespace "kube-system" is not yet complete, retrying
I0409 13:23:40.841964   43075 health.go:188] Job "upgrade-health-check" in the namespace "kube-system" is not yet complete, retrying
I0409 13:23:41.841774   43075 health.go:188] Job "upgrade-health-check" in the namespace "kube-system" is not yet complete, retrying
I0409 13:23:42.841751   43075 health.go:188] Job "upgrade-health-check" in the namespace "kube-system" is not yet complete, retrying
I0409 13:23:43.841690   43075 health.go:188] Job "upgrade-health-check" in the namespace "kube-system" is not yet complete, retrying
I0409 13:23:44.841670   43075 health.go:188] Job "upgrade-health-check" in the namespace "kube-system" is not yet complete, retrying
I0409 13:23:44.843229   43075 health.go:188] Job "upgrade-health-check" in the namespace "kube-system" is not yet complete, retrying
I0409 13:23:44.843291   43075 health.go:201] Deleting Job "upgrade-health-check" in the namespace "kube-system"
I0409 13:23:44.857519   43075 apply.go:120] [upgrade/apply] validating requested and actual version
I0409 13:23:44.857603   43075 apply.go:136] [upgrade/version] enforcing version skew policies
[upgrade/version] You have chosen to change the cluster version to "v1.18.1"
[upgrade/versions] Cluster version: v1.18.1
[upgrade/versions] kubeadm version: v1.18.1
[upgrade/confirm] Are you sure you want to proceed with the upgrade? [y/N]: y
I0409 13:23:47.248348   43075 apply.go:152] [upgrade/apply] creating prepuller
[upgrade/prepull] Will prepull images for components [kube-apiserver kube-controller-manager kube-scheduler etcd]
[upgrade/prepull] Prepulling image for component etcd.
[upgrade/prepull] Prepulling image for component kube-apiserver.
[upgrade/prepull] Prepulling image for component kube-controller-manager.
[upgrade/prepull] Prepulling image for component kube-scheduler.
[apiclient] Found 1 Pods for label selector k8s-app=upgrade-prepull-kube-apiserver
[apiclient] Found 0 Pods for label selector k8s-app=upgrade-prepull-kube-scheduler
[apiclient] Found 1 Pods for label selector k8s-app=upgrade-prepull-etcd
[apiclient] Found 1 Pods for label selector k8s-app=upgrade-prepull-kube-controller-manager
[apiclient] Found 1 Pods for label selector k8s-app=upgrade-prepull-kube-scheduler
I0409 13:23:50.126776   43075 request.go:557] Throttling request took 138.760855ms, request: GET:https://10.0.0.4:6443/api/v1/namespaces/kube-system/pods?labelSelector=k8s-app%3Dupgrade-prepull-etcd
I0409 13:23:50.726778   43075 request.go:557] Throttling request took 239.843395ms, request: GET:https://10.0.0.4:6443/api/v1/namespaces/kube-system/pods?labelSelector=k8s-app%3Dupgrade-prepull-etcd
I0409 13:23:50.926709   43075 request.go:557] Throttling request took 439.657374ms, request: GET:https://10.0.0.4:6443/api/v1/namespaces/kube-system/pods?labelSelector=k8s-app%3Dupgrade-prepull-kube-controller-manager
I0409 13:23:51.126776   43075 request.go:557] Throttling request took 202.06138ms, request: GET:https://10.0.0.4:6443/api/v1/namespaces/kube-system/pods?labelSelector=k8s-app%3Dupgrade-prepull-kube-apiserver
I0409 13:23:51.326772   43075 request.go:557] Throttling request took 343.494836ms, request: GET:https://10.0.0.4:6443/api/v1/namespaces/kube-system/pods?labelSelector=k8s-app%3Dupgrade-prepull-kube-scheduler
I0409 13:23:51.526756   43075 request.go:557] Throttling request took 539.794114ms, request: GET:https://10.0.0.4:6443/api/v1/namespaces/kube-system/pods?labelSelector=k8s-app%3Dupgrade-prepull-etcd
I0409 13:23:51.726780   43075 request.go:557] Throttling request took 739.764794ms, request: GET:https://10.0.0.4:6443/api/v1/namespaces/kube-system/pods?labelSelector=k8s-app%3Dupgrade-prepull-kube-controller-manager
I0409 13:23:51.926748   43075 request.go:557] Throttling request took 502.0449ms, request: GET:https://10.0.0.4:6443/api/v1/namespaces/kube-system/pods?labelSelector=k8s-app%3Dupgrade-prepull-kube-apiserver
I0409 13:23:52.726819   43075 request.go:557] Throttling request took 239.705461ms, request: GET:https://10.0.0.4:6443/api/v1/namespaces/kube-system/pods?labelSelector=k8s-app%3Dupgrade-prepull-kube-controller-manager
I0409 13:23:53.126753   43075 request.go:557] Throttling request took 143.454115ms, request: GET:https://10.0.0.4:6443/api/v1/namespaces/kube-system/pods?labelSelector=k8s-app%3Dupgrade-prepull-kube-scheduler
I0409 13:23:53.327022   43075 request.go:557] Throttling request took 339.968298ms, request: GET:https://10.0.0.4:6443/api/v1/namespaces/kube-system/pods?labelSelector=k8s-app%3Dupgrade-prepull-etcd
I0409 13:23:53.527063   43075 request.go:557] Throttling request took 102.315694ms, request: GET:https://10.0.0.4:6443/api/v1/namespaces/kube-system/pods?labelSelector=k8s-app%3Dupgrade-prepull-kube-apiserver
I0409 13:23:53.726768   43075 request.go:557] Throttling request took 243.508023ms, request: GET:https://10.0.0.4:6443/api/v1/namespaces/kube-system/pods?labelSelector=k8s-app%3Dupgrade-prepull-kube-scheduler
I0409 13:23:53.926719   43075 request.go:557] Throttling request took 439.774103ms, request: GET:https://10.0.0.4:6443/api/v1/namespaces/kube-system/pods?labelSelector=k8s-app%3Dupgrade-prepull-etcd
I0409 13:23:54.126764   43075 request.go:557] Throttling request took 639.686487ms, request: GET:https://10.0.0.4:6443/api/v1/namespaces/kube-system/pods?labelSelector=k8s-app%3Dupgrade-prepull-kube-controller-manager
I0409 13:23:54.326785   43075 request.go:557] Throttling request took 402.015468ms, request: GET:https://10.0.0.4:6443/api/v1/namespaces/kube-system/pods?labelSelector=k8s-app%3Dupgrade-prepull-kube-apiserver
I0409 13:23:54.526876   43075 request.go:557] Throttling request took 543.423198ms, request: GET:https://10.0.0.4:6443/api/v1/namespaces/kube-system/pods?labelSelector=k8s-app%3Dupgrade-prepull-kube-scheduler
I0409 13:23:54.727054   43075 request.go:557] Throttling request took 299.991874ms, request: GET:https://10.0.0.4:6443/api/v1/namespaces/kube-system/pods?labelSelector=k8s-app%3Dupgrade-prepull-kube-apiserver
I0409 13:23:54.926736   43075 request.go:557] Throttling request took 439.423302ms, request: GET:https://10.0.0.4:6443/api/v1/namespaces/kube-system/pods?labelSelector=k8s-app%3Dupgrade-prepull-kube-controller-manager
[upgrade/prepull] Prepulled image for component kube-controller-manager.
I0409 13:23:55.126776   43075 request.go:557] Throttling request took 639.446901ms, request: GET:https://10.0.0.4:6443/api/v1/namespaces/kube-system/pods?labelSelector=k8s-app%3Dupgrade-prepull-etcd
I0409 13:23:55.326769   43075 request.go:557] Throttling request took 402.067509ms, request: GET:https://10.0.0.4:6443/api/v1/namespaces/kube-system/pods?labelSelector=k8s-app%3Dupgrade-prepull-kube-apiserver
[upgrade/prepull] Prepulled image for component kube-apiserver.
I0409 13:23:55.526736   43075 request.go:557] Throttling request took 543.315563ms, request: GET:https://10.0.0.4:6443/api/v1/namespaces/kube-system/pods?labelSelector=k8s-app%3Dupgrade-prepull-kube-scheduler
I0409 13:23:55.726767   43075 request.go:557] Throttling request took 239.76795ms, request: GET:https://10.0.0.4:6443/api/v1/namespaces/kube-system/pods?labelSelector=k8s-app%3Dupgrade-prepull-etcd
[upgrade/prepull] Prepulled image for component kube-scheduler.
[upgrade/prepull] Prepulled image for component etcd.
[upgrade/prepull] Successfully prepulled the images for all the control plane components
I0409 13:23:56.494800   43075 apply.go:163] [upgrade/apply] performing upgrade
[upgrade/apply] Upgrading your Static Pod-hosted control plane to version "v1.18.1"...
Static pod: kube-apiserver-master1 hash: d8fa3cb5202e83fad7b470065625baf6
Static pod: kube-controller-manager-master1 hash: f3307a0d84982d3c0a356d4d4bacd7d9
Static pod: kube-scheduler-master1 hash: 2d7e8eb6d5a1f262a0ebe72a1c048ff6
I0409 13:23:56.534348   43075 etcd.go:178] retrieving etcd endpoints from "kubeadm.kubernetes.io/etcd.advertise-client-urls" annotation in etcd Pods
I0409 13:23:56.726752   43075 request.go:557] Throttling request took 192.318699ms, request: GET:https://10.0.0.4:6443/api/v1/namespaces/kube-system/pods?labelSelector=component%3Detcd%2Ctier%3Dcontrol-plane
I0409 13:23:56.731445   43075 etcd.go:192] etcd Pod "etcd-master1" is missing the "kubeadm.kubernetes.io/etcd.advertise-client-urls" annotation; cannot infer etcd advertise client URL using the Pod annotation
I0409 13:23:56.731652   43075 etcd.go:202] retrieving etcd endpoints from the cluster status
I0409 13:23:56.926752   43075 request.go:557] Throttling request took 194.866102ms, request: GET:https://10.0.0.4:6443/api/v1/namespaces/kube-system/configmaps/kubeadm-config?timeout=10s
I0409 13:23:56.929805   43075 etcd.go:102] etcd endpoints read from pods: https://10.0.0.4:2379
I0409 13:23:56.941991   43075 etcd.go:250] etcd endpoints read from etcd: https://10.0.0.4:2379
I0409 13:23:56.942233   43075 etcd.go:120] update etcd endpoints: https://10.0.0.4:2379
[upgrade/etcd] Upgrading to TLS for etcd
[upgrade/etcd] Non fatal issue encountered during upgrade: the desired etcd version for this Kubernetes version "v1.18.1" is "3.4.3-0", but the current etcd version is "3.4.3". Won't downgrade etcd, instead just continue
[upgrade/staticpods] Writing new Static Pod manifests to "/etc/kubernetes/tmp/kubeadm-upgraded-manifests799019872"
I0409 13:23:57.348548   43075 manifests.go:41] [control-plane] creating static Pod files
I0409 13:23:57.348557   43075 manifests.go:91] [control-plane] getting StaticPodSpecs
W0409 13:23:57.348682   43075 manifests.go:225] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node,RBAC"
I0409 13:23:57.349085   43075 manifests.go:104] [control-plane] adding volume "ca-certs" for component "kube-apiserver"
I0409 13:23:57.349103   43075 manifests.go:104] [control-plane] adding volume "etc-ca-certificates" for component "kube-apiserver"
I0409 13:23:57.349111   43075 manifests.go:104] [control-plane] adding volume "k8s-certs" for component "kube-apiserver"
I0409 13:23:57.349118   43075 manifests.go:104] [control-plane] adding volume "usr-local-share-ca-certificates" for component "kube-apiserver"
I0409 13:23:57.349123   43075 manifests.go:104] [control-plane] adding volume "usr-share-ca-certificates" for component "kube-apiserver"
I0409 13:23:57.355099   43075 manifests.go:121] [control-plane] wrote static Pod manifest for component "kube-apiserver" to "/etc/kubernetes/tmp/kubeadm-upgraded-manifests799019872/kube-apiserver.yaml"
I0409 13:23:57.355129   43075 manifests.go:104] [control-plane] adding volume "ca-certs" for component "kube-controller-manager"
I0409 13:23:57.355138   43075 manifests.go:104] [control-plane] adding volume "etc-ca-certificates" for component "kube-controller-manager"
I0409 13:23:57.355143   43075 manifests.go:104] [control-plane] adding volume "flexvolume-dir" for component "kube-controller-manager"
I0409 13:23:57.355148   43075 manifests.go:104] [control-plane] adding volume "k8s-certs" for component "kube-controller-manager"
I0409 13:23:57.355154   43075 manifests.go:104] [control-plane] adding volume "kubeconfig" for component "kube-controller-manager"
I0409 13:23:57.355159   43075 manifests.go:104] [control-plane] adding volume "usr-local-share-ca-certificates" for component "kube-controller-manager"
I0409 13:23:57.355165   43075 manifests.go:104] [control-plane] adding volume "usr-share-ca-certificates" for component "kube-controller-manager"
I0409 13:23:57.356038   43075 manifests.go:121] [control-plane] wrote static Pod manifest for component "kube-controller-manager" to "/etc/kubernetes/tmp/kubeadm-upgraded-manifests799019872/kube-controller-manager.yaml"
I0409 13:23:57.356062   43075 manifests.go:104] [control-plane] adding volume "kubeconfig" for component "kube-scheduler"
I0409 13:23:57.356726   43075 manifests.go:121] [control-plane] wrote static Pod manifest for component "kube-scheduler" to "/etc/kubernetes/tmp/kubeadm-upgraded-manifests799019872/kube-scheduler.yaml"
[upgrade/staticpods] Preparing for "kube-apiserver" upgrade
[upgrade/staticpods] Current and new manifests of kube-apiserver are equal, skipping upgrade
[upgrade/staticpods] Preparing for "kube-controller-manager" upgrade
[upgrade/staticpods] Current and new manifests of kube-controller-manager are equal, skipping upgrade
[upgrade/staticpods] Preparing for "kube-scheduler" upgrade
[upgrade/staticpods] Current and new manifests of kube-scheduler are equal, skipping upgrade
I0409 13:23:57.735769   43075 apply.go:169] [upgrade/postupgrade] upgrading RBAC rules and addons
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.18" in namespace kube-system with the configuration for the kubelets in the cluster
I0409 13:23:57.926792   43075 request.go:557] Throttling request took 128.366533ms, request: PUT:https://10.0.0.4:6443/api/v1/namespaces/kube-system/configmaps/kubelet-config-1.18?timeout=10s
[kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.18" ConfigMap in the kube-system namespace
I0409 13:23:58.126783   43075 request.go:557] Throttling request took 166.672072ms, request: GET:https://10.0.0.4:6443/api/v1/namespaces/kube-system/configmaps/kubelet-config-1.18?timeout=10s
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
I0409 13:23:58.130594   43075 patchnode.go:30] [patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "master1" as an annotation
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
I0409 13:24:28.336865   43075 clusterinfo.go:79] creating the RBAC rules for exposing the cluster-info ConfigMap in the kube-public namespace
[addons] Applied essential addon: CoreDNS

I'm not sure I understand what I shall do in the meantime

artisticcheese on 9 Apr 2020

as i said you can:

ctrl+c
delete coredns deployment
kubeadm init phase addons dns --config myconfig.yaml
call upgrade again

neolit123 on 9 Apr 2020

I assume config.yaml is created by kubectl get deployment coredns -n kube-system -o yaml ?

root@master1:~# kubeadm init phase addons dns --config config.yaml
unknown flag: --config
To see the stack trace of this error execute with --v=5 or higher

artisticcheese on 9 Apr 2020

--config should point to the config file you used for --config when you called kubeadm init.
you can download the clusterconfiguration like so:

kubectl get cm -n kube-system kubeadm-config -o yaml > config.yaml

and then use config.yaml with ...addons dns.

neolit123 on 9 Apr 2020

Command for kubeadm is not correct. It does not like --config

root@master1:~# kubeadm init phase addons dns --config clusterconfig.yaml        
unknown flag: --config
To see the stack trace of this error execute with --v=5 or higher

I assume it was supposed to be coredns instead of dns

root@master1:~# kubeadm init phase addon coredns --config clusterconfig.yaml   
W0409 14:13:27.362828   66305 strict.go:47] unknown configuration schema.GroupVersionKind{Group:"", Version:"v1", Kind:"ConfigMap"} for scheme definitions in "k8s.io/kubernetes/cmd/kubeadm/app/apis/kubeadm/scheme/scheme.go:31" and "k8s.io/kubernetes/cmd/kubeadm/app/componentconfigs/scheme.go:28"
no InitConfiguration or ClusterConfiguration kind was found in the YAML file
To see the stack trace of this error execute with --v=5 or higher

My clusterconfig is below

root@master1:~# cat clusterconfig.yaml 
apiVersion: v1
data:
  ClusterConfiguration: |
    apiServer:
      extraArgs:
        authorization-mode: Node,RBAC
      timeoutForControlPlane: 4m0s
    apiVersion: kubeadm.k8s.io/v1beta2
    certificatesDir: /etc/kubernetes/pki
    clusterName: kubernetes
    controllerManager: {}
    dns:
      type: CoreDNS
    etcd:
      local:
        dataDir: /var/lib/etcd
    imageRepository: k8s.gcr.io
    kind: ClusterConfiguration
    kubernetesVersion: v1.18.1
    networking:
      dnsDomain: cluster.local
      podSubnet: 10.244.0.0/16
      serviceSubnet: 10.96.0.0/12
    scheduler: {}
  ClusterStatus: |
    apiEndpoints:
      master1:
        advertiseAddress: 10.0.0.4
        bindPort: 6443
    apiVersion: kubeadm.k8s.io/v1beta2
    kind: ClusterStatus
kind: ConfigMap
metadata:
  creationTimestamp: "2020-03-15T23:35:26Z"
  name: kubeadm-config
  namespace: kube-system
  resourceVersion: "2053412"
  selfLink: /api/v1/namespaces/kube-system/configmaps/kubeadm-config
  uid: 7d47e6b6-043f-41a9-a9e9-53600efee7df

artisticcheese on 9 Apr 2020

my mistake, next to get cm you needed also to extract only the clusterconfiguration part from the configmap.

just try copy pasting this in a file

apiServer:
  extraArgs:
    authorization-mode: Node,RBAC
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta2
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns:
  type: CoreDNS
etcd:
  local:
    dataDir: /var/lib/etcd
imageRepository: k8s.gcr.io
kind: ClusterConfiguration
kubernetesVersion: v1.18.1
networking:
  dnsDomain: cluster.local
  podSubnet: 10.244.0.0/16
  serviceSubnet: 10.96.0.0/12
scheduler: {}

then use it with kubeadm init phase addon coredns --config clusterconfig.yaml

the purpose of passing a config is to preserve the setup you used on kubeadm init originally...

neolit123 on 9 Apr 2020

Not sure why this is marked as fixed since it actually requires ALL nodes to be drained not control plane only. That's what hangs process on single node control plane. I have worker node still schedulable and hence I guess original fix was never run.
This fixed it https://github.com/kubernetes/kubeadm/issues/2035#issuecomment-607330985

artisticcheese on 9 Apr 2020

👍1

Not sure why this is marked as fixed since it actually requires ALL nodes to be drained not control plane only

well, this is what the kubeadm upgrade guide requires:
https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/

cc @rajansandeep perhaps you have a better idea what is going on here?

neolit123 on 9 Apr 2020

Yes, let me take a look.
I'd like to try and reproduce this first.

rajansandeep on 9 Apr 2020

For me it was single node control plan with single node windows worker created on 1.17.4 originally and trying to upgrade it to 1.18.1

artisticcheese on 9 Apr 2020

anjibabueluri on 12 Apr 2020

👍2

I was unable to reproduce the issue:

I tried kubeadm upgrade from 1.17.4->1.18.1 following the instructions from https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/ with 2 scenarios:
One with single node control plan with single node worker
and the second with just a single node.

In both cases, I was successfully able to upgrade kubeadm with no hang issues.

Note that this was a base kubeadm install with no modifications.

rajansandeep on 17 Apr 2020

I reproduced this using a single node k8s (one node serves as both control plane and worker) when upgrading from v1.15.0 to v1.16.12. I applied the following command to apply upgrade:

apt-get update && \
apt-get install -y --allow-change-held-packages kubeadm=1.16.12-00
kubeadm upgrade plan
kubectl drain test-node --ignore-daemonsets --delete-local-data
kubeadm upgrade apply v1.16.12

After applyting the upgrade apply command, CLI stucked on "[addons] Applied essential addon: CoreDNS".

Using the upgrade apply command with argument -v=10 indicates kubeadm is waiting a CoreDNS pod to start. However this is a single node k8s, the only node in it is already drained, therefore no CoreDNS can be started. I wonder if this is the root cause of the issue.

I did an experiment to prove my assumption: when the upgrade process stucks on "[addons] Applied essential addon: CoreDNS", I uncordoned the only node in k8s, then the upgrade process finished successfully.

ethernoy on 6 Jul 2020

Same here. I try to upgrade from v1.18.5 to v1.19.3 but get stucked with a timeout on
[addons] Applied essential addon: kube-proxy

Drained all nodes and upgrade each individually was successful

Helps also for me here.

gseidel on 1 Nov 2020

oddly we do not see this problem in CI / e2e.

but we have minor pending changes to the upgrade docs related to this:
https://github.com/kubernetes/website/pull/24704

summary: drain / uncordon should be done around the kubelet restart for CP nodes.

neolit123 on 3 Nov 2020

I'm reproducing that issue from my chef cookbook when trying to upgrade from 1.16 to 1.17 on Debian Buster (while the same passes on Debian Stretch).

Someone on SO said that the nodeRegistration.name needs to be the same as the machine's hostname. I don't have any output from the command kubectl -n kube-system get cm kubeadm-config -o jsonpath={.data.MasterConfiguration} which I guess is from a old Kubernetes version and therefore doesn't work anymore.
How can I check what's the nodeRegistration.name, or its equivalent from today?

zedtux on 22 Dec 2020

How can I check what's the nodeRegistration.name, or its equivalent from today?

.name is passed to the kubelet and that is how the node is registered.
so it's equivalent to the node name.

neolit123 on 22 Dec 2020

Thank you @neolit123 for your comment.

On my side, I've finished to test each version of Kubernetes (from 1.15 to 1.20) and 1.17, 1.19 and 1.20 are failing on Debian Buster. On Stretch all are passing.

I will now test the same but with Ubuntu 18.04.

zedtux on 23 Dec 2020

Was this page helpful?

0 / 5 - 0 ratings