BUG REPORT
kubeadm version (use kubeadm version): 1.15.3-00 and 1.16.7-00
Environment:
kubectl version): Already upgraded past what it was, sorryuname -a): Linux debug-6 4.4.0-169-generic #198-Ubuntu SMP Tue Nov 12 10:38:00 UTC 2019 x86_64 x86_64 x86_64 GNU/LinuxUpgraded from 1.15 to 1.16 and 1.16 to 1.17 as per instructions in https://v1-16.docs.kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/ and https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/.
The upgrade hung during step (4) of https://v1-16.docs.kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/#upgrading-control-plane-nodes. The last thing kubeadm upgrade printing[addons] Applied essential addon: CoreDNS`. I let it sit there for 30 minutes and then skipped ahead to step (6) and uncordoned the control plane node. After uncordoning the node, it completed within a few minutes.
I expected it to complete within half an hour.
I suspect if you run a single control plane cluster and follow the control plane upgrade steps, it'll repro. It looks like it will wait until coredns is ready. This will never happen because the single node in the cluster is drained, as per step (2) in https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/#upgrade-the-first-control-plane-node. So it waits until the timeout, which seems to initially be 5 seconds with exponential backoff up to 10 times or so. I think in aggregate that's over an hour, which would explain why it seemed to me to hang for half an hour.
We're running a single control plane cluster as per https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/, which seems to be the primary culprit here.
I suspect if you run a single control plane cluster and follow the control plane upgrade steps, it'll repro. It looks like it will wait until coredns is ready. This will never happen because the single node in the cluster is drained, as per step (2) in https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/#upgrade-the-first-control-plane-node. So it waits until the timeout, which seems to initially be 5 seconds with exponential backoff up to 10 times or so. I think in aggregate that's over an hour, which would explain why it seemed to me to hang for half an hour.
i ended up calculating this as 2555 seconds (~43 minutes) for the 10th step.
i think we should ideally reduce this to timeout to something like 30 seconds max.
cc @rajansandeep
xref:
https://github.com/kubernetes/kubernetes/blob/3aa59f7f3077642592dc8a864fcef8ba98699894/cmd/kubeadm/app/phases/upgrade/postupgrade.go#L140
https://github.com/kubernetes/kubernetes/blob/dde6e8e7465468c32642659cb708a5cc922add64/cmd/kubeadm/app/util/apiclient/wait.go#L255-L260
/kind bug
/priority important-longterm
I think reducing the timeout would improve the UX since it wouldn't hang for so long. That said, IIUC, it would still get reported as an error, which might be confusing to users.
I suspect if you run a single control plane cluster and follow the control plane upgrade steps, it'll repro. It looks like it will wait until coredns is ready.
The part of code you're referring to only triggers when the DNS server is changed.
I'm not sure this is the reason for the upgrade hanging.
Upgraded from 1.15 to 1.16 and 1.16 to 1.17 as per instructions in https://v1-16.docs.kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/ and https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/.
@willchan just to be clear, you were able to upgrade from 1.15 to 1.16 successfully, but faced this issue when upgrading from 1.16 to 1.17?
Maybe I'm misunderstanding the code, but AFAICT, it always waits for the expected DNS add-on. And then it always tries to delete the other add-on, but ignores the error if it's not found.
Upgraded from 1.15 to 1.16 and 1.16 to 1.17 as per instructions in https://v1-16.docs.kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/ and https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/.
@willchan just to be clear, you were able to upgrade from 1.15 to 1.16 successfully, but faced this issue when upgrading from 1.16 to 1.17?
I hit the same issue on both upgrades, from 1.15 to 1.16 and 1.16 to 1.17.
I hit the same issue on both upgrades, from 1.15 to 1.16 and 1.16 to 1.17.
Okay.
Before upgrading, is the DNS server kube-dns or CoreDNS?
C
I hit the same issue on both upgrades, from 1.15 to 1.16 and 1.16 to 1.17.
Okay.
Before upgrading, is the DNS server kube-dns or CoreDNS?
CoreDNS. I did not explicitly configure any feature flag in my setup, and I believe CoreDNS hit GA a number of releases ago, so any switch would have happened awhile back.
/assign
Maybe we should check if there are some schedulable nodes to run DNS deployment.
I have sent a PR kubernetes/kubernetes#88434 to try to solve this.
I got the same issue upgrading from 1.16.8 to 1.17.4. It is still running so I cannot send any further info at the moment.
Got same issue on 1.17.3 -> 1.18.0
Got same issue on 1.17.3 -> 1.18.0
my understanding was that this was fixed in 1.18.
can you show the output of the command with --v=5?
Surely, but it hangs, and give postupgrade error, so I just updated kubelet, I think that's it?
[root@master user]# kubeadm upgrade apply v1.18.0 -v=5
I0331 11:23:14.017619 19989 apply.go:112] [upgrade/apply] verifying health of cluster
I0331 11:23:14.018024 19989 apply.go:113] [upgrade/apply] retrieving configuration from cluster
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
I0331 11:23:14.048696 19989 common.go:123] running preflight checks
[preflight] Running pre-flight checks.
I0331 11:23:14.048838 19989 preflight.go:79] validating if there are any unsupported CoreDNS plugins in the Corefile
I0331 11:23:14.056960 19989 preflight.go:105] validating if migration can be done for the current CoreDNS release.
[upgrade] Running cluster health checks
I0331 11:23:14.061700 19989 health.go:158] Creating Job "upgrade-health-check" in the namespace "kube-system"
I0331 11:23:14.071587 19989 health.go:188] Job "upgrade-health-check" in the namespace "kube-system" is not yet complete, retrying
I0331 11:23:15.074000 19989 health.go:188] Job "upgrade-health-check" in the namespace "kube-system" is not yet complete, retrying
I0331 11:23:16.074639 19989 health.go:188] Job "upgrade-health-check" in the namespace "kube-system" is not yet complete, retrying
I0331 11:23:17.074024 19989 health.go:195] Job "upgrade-health-check" in the namespace "kube-system" completed
I0331 11:23:17.074049 19989 health.go:201] Deleting Job "upgrade-health-check" in the namespace "kube-system"
I0331 11:23:17.080363 19989 apply.go:120] [upgrade/apply] validating requested and actual version
I0331 11:23:17.080407 19989 apply.go:136] [upgrade/version] enforcing version skew policies
[upgrade/version] You have chosen to change the cluster version to "v1.18.0"
[upgrade/versions] Cluster version: v1.18.0
[upgrade/versions] kubeadm version: v1.18.0
[upgrade/confirm] Are you sure you want to proceed with the upgrade? [y/N]: y
I0331 11:23:19.015567 19989 apply.go:152] [upgrade/apply] creating prepuller
[upgrade/prepull] Will prepull images for components [kube-apiserver kube-controller-manager kube-scheduler etcd]
[upgrade/prepull] Prepulling image for component etcd.
[upgrade/prepull] Prepulling image for component kube-scheduler.
[upgrade/prepull] Prepulling image for component kube-apiserver.
[upgrade/prepull] Prepulling image for component kube-controller-manager.
[apiclient] Found 0 Pods for label selector k8s-app=upgrade-prepull-etcd
[apiclient] Found 1 Pods for label selector k8s-app=upgrade-prepull-kube-scheduler
[apiclient] Found 1 Pods for label selector k8s-app=upgrade-prepull-kube-controller-manager
[apiclient] Found 1 Pods for label selector k8s-app=upgrade-prepull-kube-apiserver
[apiclient] Found 1 Pods for label selector k8s-app=upgrade-prepull-etcd
[upgrade/prepull] Prepulled image for component etcd.
[upgrade/prepull] Prepulled image for component kube-scheduler.
[upgrade/prepull] Prepulled image for component kube-apiserver.
[upgrade/prepull] Prepulled image for component kube-controller-manager.
[upgrade/prepull] Successfully prepulled the images for all the control plane components
I0331 11:23:21.055361 19989 apply.go:163] [upgrade/apply] performing upgrade
[upgrade/apply] Upgrading your Static Pod-hosted control plane to version "v1.18.0"...
I0331 11:23:21.233716 19989 request.go:557] Throttling request took 177.321258ms, request: GET:https://10.10.10.10:6443/api/v1/namespaces/kube-system/pods/kube-apiserver-master?timeout=10s
Static pod: kube-apiserver-master hash: 0b6c88d27d3c56d665d5e4043d1d8f7b
I0331 11:23:21.433726 19989 request.go:557] Throttling request took 192.492308ms, request: GET:https://10.10.10.10:6443/api/v1/namespaces/kube-system/pods/kube-controller-manager-master?timeout=10s
Static pod: kube-controller-manager-master hash: 334d1abebb5d44226b34ea18a8940065
I0331 11:23:21.633675 19989 request.go:557] Throttling request took 197.590337ms, request: GET:https://10.10.10.10:6443/api/v1/namespaces/kube-system/pods/kube-scheduler-master?timeout=10s
Static pod: kube-scheduler-master hash: 68835a2012b9716a7c018f4247ae940d
I0331 11:23:21.635730 19989 etcd.go:178] retrieving etcd endpoints from "kubeadm.kubernetes.io/etcd.advertise-client-urls" annotation in etcd Pods
I0331 11:23:21.833679 19989 request.go:557] Throttling request took 197.879295ms, request: GET:https://10.10.10.10:6443/api/v1/namespaces/kube-system/pods?labelSelector=component%3Detcd%2Ctier%3Dcontrol-plane
I0331 11:23:21.836670 19989 etcd.go:102] etcd endpoints read from pods: https://10.10.10.10:2379
I0331 11:23:21.845679 19989 etcd.go:250] etcd endpoints read from etcd: https://10.10.10.10:2379
I0331 11:23:21.845723 19989 etcd.go:120] update etcd endpoints: https://10.10.10.10:2379
[upgrade/etcd] Upgrading to TLS for etcd
[upgrade/etcd] Non fatal issue encountered during upgrade: the desired etcd version for this Kubernetes version "v1.18.0" is "3.4.3-0", but the current etcd version is "3.4.3". Won't downgrade etcd, instead just continue
[upgrade/staticpods] Writing new Static Pod manifests to "/etc/kubernetes/tmp/kubeadm-upgraded-manifests566807007"
I0331 11:23:22.281480 19989 manifests.go:41] [control-plane] creating static Pod files
I0331 11:23:22.281493 19989 manifests.go:91] [control-plane] getting StaticPodSpecs
W0331 11:23:22.281726 19989 manifests.go:225] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node,RBAC"
I0331 11:23:22.281964 19989 manifests.go:104] [control-plane] adding volume "ca-certs" for component "kube-apiserver"
I0331 11:23:22.281974 19989 manifests.go:104] [control-plane] adding volume "etc-pki" for component "kube-apiserver"
I0331 11:23:22.281978 19989 manifests.go:104] [control-plane] adding volume "k8s-certs" for component "kube-apiserver"
I0331 11:23:22.287803 19989 manifests.go:121] [control-plane] wrote static Pod manifest for component "kube-apiserver" to "/etc/kubernetes/tmp/kubeadm-upgraded-manifests566807007/kube-apiserver.yaml"
I0331 11:23:22.287836 19989 manifests.go:104] [control-plane] adding volume "ca-certs" for component "kube-controller-manager"
I0331 11:23:22.287850 19989 manifests.go:104] [control-plane] adding volume "etc-pki" for component "kube-controller-manager"
I0331 11:23:22.287857 19989 manifests.go:104] [control-plane] adding volume "flexvolume-dir" for component "kube-controller-manager"
I0331 11:23:22.287867 19989 manifests.go:104] [control-plane] adding volume "k8s-certs" for component "kube-controller-manager"
I0331 11:23:22.287874 19989 manifests.go:104] [control-plane] adding volume "kubeconfig" for component "kube-controller-manager"
I0331 11:23:22.288627 19989 manifests.go:121] [control-plane] wrote static Pod manifest for component "kube-controller-manager" to "/etc/kubernetes/tmp/kubeadm-upgraded-manifests566807007/kube-controller-manager.yaml"
I0331 11:23:22.288657 19989 manifests.go:104] [control-plane] adding volume "kubeconfig" for component "kube-scheduler"
I0331 11:23:22.289179 19989 manifests.go:121] [control-plane] wrote static Pod manifest for component "kube-scheduler" to "/etc/kubernetes/tmp/kubeadm-upgraded-manifests566807007/kube-scheduler.yaml"
[upgrade/staticpods] Preparing for "kube-apiserver" upgrade
[upgrade/staticpods] Current and new manifests of kube-apiserver are equal, skipping upgrade
[upgrade/staticpods] Preparing for "kube-controller-manager" upgrade
[upgrade/staticpods] Current and new manifests of kube-controller-manager are equal, skipping upgrade
[upgrade/staticpods] Preparing for "kube-scheduler" upgrade
[upgrade/staticpods] Current and new manifests of kube-scheduler are equal, skipping upgrade
I0331 11:23:22.383676 19989 apply.go:169] [upgrade/postupgrade] upgrading RBAC rules and addons
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.18" in namespace kube-system with the configuration for the kubelets in the cluster
I0331 11:23:22.633673 19989 request.go:557] Throttling request took 188.205754ms, request: POST:https://10.10.10.10:6443/api/v1/namespaces/kube-system/configmaps?timeout=10s
I0331 11:23:22.833676 19989 request.go:557] Throttling request took 196.293672ms, request: PUT:https://10.10.10.10:6443/api/v1/namespaces/kube-system/configmaps/kubelet-config-1.18?timeout=10s
[kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.18" ConfigMap in the kube-system namespace
I0331 11:23:23.033701 19989 request.go:557] Throttling request took 189.200673ms, request: GET:https://10.10.10.10:6443/api/v1/namespaces/kube-system/configmaps/kubelet-config-1.18?timeout=10s
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
I0331 11:23:23.035822 19989 patchnode.go:30] [patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "master" as an annotation
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
I0331 11:23:23.557343 19989 clusterinfo.go:79] creating the RBAC rules for exposing the cluster-info ConfigMap in the kube-public namespace
I0331 11:23:23.635821 19989 request.go:557] Throttling request took 74.68415ms, request: PUT:https://10.10.10.10:6443/apis/rbac.authorization.k8s.io/v1/namespaces/kube-public/roles/kubeadm:bootstrap-signer-clusterinfo?timeout=10s
I0331 11:23:23.835820 19989 request.go:557] Throttling request took 197.486809ms, request: POST:https://10.10.10.10:6443/apis/rbac.authorization.k8s.io/v1/namespaces/kube-public/rolebindings?timeout=10s
I0331 11:23:24.035835 19989 request.go:557] Throttling request took 196.634732ms, request: PUT:https://10.10.10.10:6443/apis/rbac.authorization.k8s.io/v1/namespaces/kube-public/rolebindings/kubeadm:bootstrap-signer-clusterinfo?timeout=10s
I0331 11:23:24.233695 19989 request.go:557] Throttling request took 185.629269ms, request: PUT:https://10.10.10.10:6443/api/v1/namespaces/kube-system/configmaps/coredns?timeout=10s
I0331 11:23:24.435815 19989 request.go:557] Throttling request took 196.663589ms, request: PUT:https://10.10.10.10:6443/apis/rbac.authorization.k8s.io/v1/clusterroles/system:coredns?timeout=10s
I0331 11:23:24.635816 19989 request.go:557] Throttling request took 197.817314ms, request: POST:https://10.10.10.10:6443/apis/rbac.authorization.k8s.io/v1/clusterrolebindings?timeout=10s
I0331 11:23:24.835844 19989 request.go:557] Throttling request took 196.604798ms, request: PUT:https://10.10.10.10:6443/apis/rbac.authorization.k8s.io/v1/clusterrolebindings/system:coredns?timeout=10s
[addons] Applied essential addon: CoreDNS
I0331 11:23:25.033673 19989 request.go:557] Throttling request took 175.122741ms, request: GET:https://10.10.10.10:6443/api/v1/nodes?fieldSelector=spec.unschedulable%3Dfalse
[addons] Applied essential addon: kube-proxy
timed out waiting for the condition
[upgrade/postupgrade] FATAL post-upgrade error
k8s.io/kubernetes/cmd/kubeadm/app/cmd/upgrade.runApply
/workspace/anago-v1.18.0-rc.1.21+8be33caaf953ac/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/upgrade/apply.go:171
k8s.io/kubernetes/cmd/kubeadm/app/cmd/upgrade.NewCmdApply.func1
/workspace/anago-v1.18.0-rc.1.21+8be33caaf953ac/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/upgrade/apply.go:79
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).execute
/workspace/anago-v1.18.0-rc.1.21+8be33caaf953ac/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:826
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).ExecuteC
/workspace/anago-v1.18.0-rc.1.21+8be33caaf953ac/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:914
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).Execute
/workspace/anago-v1.18.0-rc.1.21+8be33caaf953ac/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:864
k8s.io/kubernetes/cmd/kubeadm/app.Run
/workspace/anago-v1.18.0-rc.1.21+8be33caaf953ac/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/kubeadm.go:50
main.main
_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/kubeadm.go:25
runtime.main
/usr/local/go/src/runtime/proc.go:203
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1357
timed out waiting for the condition
seems like a temporary timeout, but the error is not very descriptive.
try calling kubeadm upgrade apply v1.18.0 -v=10 again and see if it passes.
Surely, but it hangs, and give postupgrade error, so I just updated kubelet, I think that's it?
note that kubelet upgrades should be applied only after kubeadm upgrade... has passed on all nodes.
Drained all nodes and upgrade each individually was successful
Can somebody sum up what one need to do when you get in this situation. Trying to upgrade 1.17.4 -> 1.18.1 and it hangs at this step.
@artisticcheese this should have been fixed in 18.1.
can you provide logs using at least -v=5?
one option is to just delete the coredns deployment and reapply it using the 18.1 kubeadm binary using kubeadm init phase addons dns --config myconfig.yaml, which should make the upgrade apply then pass, in theory..
Here is log
I0409 13:23:29.767850 43075 apply.go:112] [upgrade/apply] verifying health of cluster
I0409 13:23:29.767931 43075 apply.go:113] [upgrade/apply] retrieving configuration from cluster
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
I0409 13:23:29.808811 43075 common.go:123] running preflight checks
[preflight] Running pre-flight checks.
I0409 13:23:29.809039 43075 preflight.go:79] validating if there are any unsupported CoreDNS plugins in the Corefile
I0409 13:23:29.817934 43075 preflight.go:105] validating if migration can be done for the current CoreDNS release.
[upgrade] Running cluster health checks
I0409 13:23:29.823619 43075 health.go:158] Creating Job "upgrade-health-check" in the namespace "kube-system"
I0409 13:23:29.838292 43075 health.go:188] Job "upgrade-health-check" in the namespace "kube-system" is not yet complete, retrying
I0409 13:23:30.841703 43075 health.go:188] Job "upgrade-health-check" in the namespace "kube-system" is not yet complete, retrying
I0409 13:23:31.841373 43075 health.go:188] Job "upgrade-health-check" in the namespace "kube-system" is not yet complete, retrying
I0409 13:23:32.841843 43075 health.go:188] Job "upgrade-health-check" in the namespace "kube-system" is not yet complete, retrying
I0409 13:23:33.841561 43075 health.go:188] Job "upgrade-health-check" in the namespace "kube-system" is not yet complete, retrying
I0409 13:23:34.841718 43075 health.go:188] Job "upgrade-health-check" in the namespace "kube-system" is not yet complete, retrying
I0409 13:23:35.841798 43075 health.go:188] Job "upgrade-health-check" in the namespace "kube-system" is not yet complete, retrying
I0409 13:23:36.841901 43075 health.go:188] Job "upgrade-health-check" in the namespace "kube-system" is not yet complete, retrying
I0409 13:23:37.842199 43075 health.go:188] Job "upgrade-health-check" in the namespace "kube-system" is not yet complete, retrying
I0409 13:23:38.842155 43075 health.go:188] Job "upgrade-health-check" in the namespace "kube-system" is not yet complete, retrying
I0409 13:23:39.841545 43075 health.go:188] Job "upgrade-health-check" in the namespace "kube-system" is not yet complete, retrying
I0409 13:23:40.841964 43075 health.go:188] Job "upgrade-health-check" in the namespace "kube-system" is not yet complete, retrying
I0409 13:23:41.841774 43075 health.go:188] Job "upgrade-health-check" in the namespace "kube-system" is not yet complete, retrying
I0409 13:23:42.841751 43075 health.go:188] Job "upgrade-health-check" in the namespace "kube-system" is not yet complete, retrying
I0409 13:23:43.841690 43075 health.go:188] Job "upgrade-health-check" in the namespace "kube-system" is not yet complete, retrying
I0409 13:23:44.841670 43075 health.go:188] Job "upgrade-health-check" in the namespace "kube-system" is not yet complete, retrying
I0409 13:23:44.843229 43075 health.go:188] Job "upgrade-health-check" in the namespace "kube-system" is not yet complete, retrying
I0409 13:23:44.843291 43075 health.go:201] Deleting Job "upgrade-health-check" in the namespace "kube-system"
I0409 13:23:44.857519 43075 apply.go:120] [upgrade/apply] validating requested and actual version
I0409 13:23:44.857603 43075 apply.go:136] [upgrade/version] enforcing version skew policies
[upgrade/version] You have chosen to change the cluster version to "v1.18.1"
[upgrade/versions] Cluster version: v1.18.1
[upgrade/versions] kubeadm version: v1.18.1
[upgrade/confirm] Are you sure you want to proceed with the upgrade? [y/N]: y
I0409 13:23:47.248348 43075 apply.go:152] [upgrade/apply] creating prepuller
[upgrade/prepull] Will prepull images for components [kube-apiserver kube-controller-manager kube-scheduler etcd]
[upgrade/prepull] Prepulling image for component etcd.
[upgrade/prepull] Prepulling image for component kube-apiserver.
[upgrade/prepull] Prepulling image for component kube-controller-manager.
[upgrade/prepull] Prepulling image for component kube-scheduler.
[apiclient] Found 1 Pods for label selector k8s-app=upgrade-prepull-kube-apiserver
[apiclient] Found 0 Pods for label selector k8s-app=upgrade-prepull-kube-scheduler
[apiclient] Found 1 Pods for label selector k8s-app=upgrade-prepull-etcd
[apiclient] Found 1 Pods for label selector k8s-app=upgrade-prepull-kube-controller-manager
[apiclient] Found 1 Pods for label selector k8s-app=upgrade-prepull-kube-scheduler
I0409 13:23:50.126776 43075 request.go:557] Throttling request took 138.760855ms, request: GET:https://10.0.0.4:6443/api/v1/namespaces/kube-system/pods?labelSelector=k8s-app%3Dupgrade-prepull-etcd
I0409 13:23:50.726778 43075 request.go:557] Throttling request took 239.843395ms, request: GET:https://10.0.0.4:6443/api/v1/namespaces/kube-system/pods?labelSelector=k8s-app%3Dupgrade-prepull-etcd
I0409 13:23:50.926709 43075 request.go:557] Throttling request took 439.657374ms, request: GET:https://10.0.0.4:6443/api/v1/namespaces/kube-system/pods?labelSelector=k8s-app%3Dupgrade-prepull-kube-controller-manager
I0409 13:23:51.126776 43075 request.go:557] Throttling request took 202.06138ms, request: GET:https://10.0.0.4:6443/api/v1/namespaces/kube-system/pods?labelSelector=k8s-app%3Dupgrade-prepull-kube-apiserver
I0409 13:23:51.326772 43075 request.go:557] Throttling request took 343.494836ms, request: GET:https://10.0.0.4:6443/api/v1/namespaces/kube-system/pods?labelSelector=k8s-app%3Dupgrade-prepull-kube-scheduler
I0409 13:23:51.526756 43075 request.go:557] Throttling request took 539.794114ms, request: GET:https://10.0.0.4:6443/api/v1/namespaces/kube-system/pods?labelSelector=k8s-app%3Dupgrade-prepull-etcd
I0409 13:23:51.726780 43075 request.go:557] Throttling request took 739.764794ms, request: GET:https://10.0.0.4:6443/api/v1/namespaces/kube-system/pods?labelSelector=k8s-app%3Dupgrade-prepull-kube-controller-manager
I0409 13:23:51.926748 43075 request.go:557] Throttling request took 502.0449ms, request: GET:https://10.0.0.4:6443/api/v1/namespaces/kube-system/pods?labelSelector=k8s-app%3Dupgrade-prepull-kube-apiserver
I0409 13:23:52.726819 43075 request.go:557] Throttling request took 239.705461ms, request: GET:https://10.0.0.4:6443/api/v1/namespaces/kube-system/pods?labelSelector=k8s-app%3Dupgrade-prepull-kube-controller-manager
I0409 13:23:53.126753 43075 request.go:557] Throttling request took 143.454115ms, request: GET:https://10.0.0.4:6443/api/v1/namespaces/kube-system/pods?labelSelector=k8s-app%3Dupgrade-prepull-kube-scheduler
I0409 13:23:53.327022 43075 request.go:557] Throttling request took 339.968298ms, request: GET:https://10.0.0.4:6443/api/v1/namespaces/kube-system/pods?labelSelector=k8s-app%3Dupgrade-prepull-etcd
I0409 13:23:53.527063 43075 request.go:557] Throttling request took 102.315694ms, request: GET:https://10.0.0.4:6443/api/v1/namespaces/kube-system/pods?labelSelector=k8s-app%3Dupgrade-prepull-kube-apiserver
I0409 13:23:53.726768 43075 request.go:557] Throttling request took 243.508023ms, request: GET:https://10.0.0.4:6443/api/v1/namespaces/kube-system/pods?labelSelector=k8s-app%3Dupgrade-prepull-kube-scheduler
I0409 13:23:53.926719 43075 request.go:557] Throttling request took 439.774103ms, request: GET:https://10.0.0.4:6443/api/v1/namespaces/kube-system/pods?labelSelector=k8s-app%3Dupgrade-prepull-etcd
I0409 13:23:54.126764 43075 request.go:557] Throttling request took 639.686487ms, request: GET:https://10.0.0.4:6443/api/v1/namespaces/kube-system/pods?labelSelector=k8s-app%3Dupgrade-prepull-kube-controller-manager
I0409 13:23:54.326785 43075 request.go:557] Throttling request took 402.015468ms, request: GET:https://10.0.0.4:6443/api/v1/namespaces/kube-system/pods?labelSelector=k8s-app%3Dupgrade-prepull-kube-apiserver
I0409 13:23:54.526876 43075 request.go:557] Throttling request took 543.423198ms, request: GET:https://10.0.0.4:6443/api/v1/namespaces/kube-system/pods?labelSelector=k8s-app%3Dupgrade-prepull-kube-scheduler
I0409 13:23:54.727054 43075 request.go:557] Throttling request took 299.991874ms, request: GET:https://10.0.0.4:6443/api/v1/namespaces/kube-system/pods?labelSelector=k8s-app%3Dupgrade-prepull-kube-apiserver
I0409 13:23:54.926736 43075 request.go:557] Throttling request took 439.423302ms, request: GET:https://10.0.0.4:6443/api/v1/namespaces/kube-system/pods?labelSelector=k8s-app%3Dupgrade-prepull-kube-controller-manager
[upgrade/prepull] Prepulled image for component kube-controller-manager.
I0409 13:23:55.126776 43075 request.go:557] Throttling request took 639.446901ms, request: GET:https://10.0.0.4:6443/api/v1/namespaces/kube-system/pods?labelSelector=k8s-app%3Dupgrade-prepull-etcd
I0409 13:23:55.326769 43075 request.go:557] Throttling request took 402.067509ms, request: GET:https://10.0.0.4:6443/api/v1/namespaces/kube-system/pods?labelSelector=k8s-app%3Dupgrade-prepull-kube-apiserver
[upgrade/prepull] Prepulled image for component kube-apiserver.
I0409 13:23:55.526736 43075 request.go:557] Throttling request took 543.315563ms, request: GET:https://10.0.0.4:6443/api/v1/namespaces/kube-system/pods?labelSelector=k8s-app%3Dupgrade-prepull-kube-scheduler
I0409 13:23:55.726767 43075 request.go:557] Throttling request took 239.76795ms, request: GET:https://10.0.0.4:6443/api/v1/namespaces/kube-system/pods?labelSelector=k8s-app%3Dupgrade-prepull-etcd
[upgrade/prepull] Prepulled image for component kube-scheduler.
[upgrade/prepull] Prepulled image for component etcd.
[upgrade/prepull] Successfully prepulled the images for all the control plane components
I0409 13:23:56.494800 43075 apply.go:163] [upgrade/apply] performing upgrade
[upgrade/apply] Upgrading your Static Pod-hosted control plane to version "v1.18.1"...
Static pod: kube-apiserver-master1 hash: d8fa3cb5202e83fad7b470065625baf6
Static pod: kube-controller-manager-master1 hash: f3307a0d84982d3c0a356d4d4bacd7d9
Static pod: kube-scheduler-master1 hash: 2d7e8eb6d5a1f262a0ebe72a1c048ff6
I0409 13:23:56.534348 43075 etcd.go:178] retrieving etcd endpoints from "kubeadm.kubernetes.io/etcd.advertise-client-urls" annotation in etcd Pods
I0409 13:23:56.726752 43075 request.go:557] Throttling request took 192.318699ms, request: GET:https://10.0.0.4:6443/api/v1/namespaces/kube-system/pods?labelSelector=component%3Detcd%2Ctier%3Dcontrol-plane
I0409 13:23:56.731445 43075 etcd.go:192] etcd Pod "etcd-master1" is missing the "kubeadm.kubernetes.io/etcd.advertise-client-urls" annotation; cannot infer etcd advertise client URL using the Pod annotation
I0409 13:23:56.731652 43075 etcd.go:202] retrieving etcd endpoints from the cluster status
I0409 13:23:56.926752 43075 request.go:557] Throttling request took 194.866102ms, request: GET:https://10.0.0.4:6443/api/v1/namespaces/kube-system/configmaps/kubeadm-config?timeout=10s
I0409 13:23:56.929805 43075 etcd.go:102] etcd endpoints read from pods: https://10.0.0.4:2379
I0409 13:23:56.941991 43075 etcd.go:250] etcd endpoints read from etcd: https://10.0.0.4:2379
I0409 13:23:56.942233 43075 etcd.go:120] update etcd endpoints: https://10.0.0.4:2379
[upgrade/etcd] Upgrading to TLS for etcd
[upgrade/etcd] Non fatal issue encountered during upgrade: the desired etcd version for this Kubernetes version "v1.18.1" is "3.4.3-0", but the current etcd version is "3.4.3". Won't downgrade etcd, instead just continue
[upgrade/staticpods] Writing new Static Pod manifests to "/etc/kubernetes/tmp/kubeadm-upgraded-manifests799019872"
I0409 13:23:57.348548 43075 manifests.go:41] [control-plane] creating static Pod files
I0409 13:23:57.348557 43075 manifests.go:91] [control-plane] getting StaticPodSpecs
W0409 13:23:57.348682 43075 manifests.go:225] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node,RBAC"
I0409 13:23:57.349085 43075 manifests.go:104] [control-plane] adding volume "ca-certs" for component "kube-apiserver"
I0409 13:23:57.349103 43075 manifests.go:104] [control-plane] adding volume "etc-ca-certificates" for component "kube-apiserver"
I0409 13:23:57.349111 43075 manifests.go:104] [control-plane] adding volume "k8s-certs" for component "kube-apiserver"
I0409 13:23:57.349118 43075 manifests.go:104] [control-plane] adding volume "usr-local-share-ca-certificates" for component "kube-apiserver"
I0409 13:23:57.349123 43075 manifests.go:104] [control-plane] adding volume "usr-share-ca-certificates" for component "kube-apiserver"
I0409 13:23:57.355099 43075 manifests.go:121] [control-plane] wrote static Pod manifest for component "kube-apiserver" to "/etc/kubernetes/tmp/kubeadm-upgraded-manifests799019872/kube-apiserver.yaml"
I0409 13:23:57.355129 43075 manifests.go:104] [control-plane] adding volume "ca-certs" for component "kube-controller-manager"
I0409 13:23:57.355138 43075 manifests.go:104] [control-plane] adding volume "etc-ca-certificates" for component "kube-controller-manager"
I0409 13:23:57.355143 43075 manifests.go:104] [control-plane] adding volume "flexvolume-dir" for component "kube-controller-manager"
I0409 13:23:57.355148 43075 manifests.go:104] [control-plane] adding volume "k8s-certs" for component "kube-controller-manager"
I0409 13:23:57.355154 43075 manifests.go:104] [control-plane] adding volume "kubeconfig" for component "kube-controller-manager"
I0409 13:23:57.355159 43075 manifests.go:104] [control-plane] adding volume "usr-local-share-ca-certificates" for component "kube-controller-manager"
I0409 13:23:57.355165 43075 manifests.go:104] [control-plane] adding volume "usr-share-ca-certificates" for component "kube-controller-manager"
I0409 13:23:57.356038 43075 manifests.go:121] [control-plane] wrote static Pod manifest for component "kube-controller-manager" to "/etc/kubernetes/tmp/kubeadm-upgraded-manifests799019872/kube-controller-manager.yaml"
I0409 13:23:57.356062 43075 manifests.go:104] [control-plane] adding volume "kubeconfig" for component "kube-scheduler"
I0409 13:23:57.356726 43075 manifests.go:121] [control-plane] wrote static Pod manifest for component "kube-scheduler" to "/etc/kubernetes/tmp/kubeadm-upgraded-manifests799019872/kube-scheduler.yaml"
[upgrade/staticpods] Preparing for "kube-apiserver" upgrade
[upgrade/staticpods] Current and new manifests of kube-apiserver are equal, skipping upgrade
[upgrade/staticpods] Preparing for "kube-controller-manager" upgrade
[upgrade/staticpods] Current and new manifests of kube-controller-manager are equal, skipping upgrade
[upgrade/staticpods] Preparing for "kube-scheduler" upgrade
[upgrade/staticpods] Current and new manifests of kube-scheduler are equal, skipping upgrade
I0409 13:23:57.735769 43075 apply.go:169] [upgrade/postupgrade] upgrading RBAC rules and addons
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.18" in namespace kube-system with the configuration for the kubelets in the cluster
I0409 13:23:57.926792 43075 request.go:557] Throttling request took 128.366533ms, request: PUT:https://10.0.0.4:6443/api/v1/namespaces/kube-system/configmaps/kubelet-config-1.18?timeout=10s
[kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.18" ConfigMap in the kube-system namespace
I0409 13:23:58.126783 43075 request.go:557] Throttling request took 166.672072ms, request: GET:https://10.0.0.4:6443/api/v1/namespaces/kube-system/configmaps/kubelet-config-1.18?timeout=10s
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
I0409 13:23:58.130594 43075 patchnode.go:30] [patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "master1" as an annotation
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
I0409 13:24:28.336865 43075 clusterinfo.go:79] creating the RBAC rules for exposing the cluster-info ConfigMap in the kube-public namespace
[addons] Applied essential addon: CoreDNS
I'm not sure I understand what I shall do in the meantime
as i said you can:
kubeadm init phase addons dns --config myconfig.yamlI assume config.yaml is created by kubectl get deployment coredns -n kube-system -o yaml ?
root@master1:~# kubeadm init phase addons dns --config config.yaml
unknown flag: --config
To see the stack trace of this error execute with --v=5 or higher
--config should point to the config file you used for --config when you called kubeadm init.
you can download the clusterconfiguration like so:
kubectl get cm -n kube-system kubeadm-config -o yaml > config.yaml
and then use config.yaml with ...addons dns.
Command for kubeadm is not correct. It does not like --config
root@master1:~# kubeadm init phase addons dns --config clusterconfig.yaml
unknown flag: --config
To see the stack trace of this error execute with --v=5 or higher
I assume it was supposed to be coredns instead of dns
root@master1:~# kubeadm init phase addon coredns --config clusterconfig.yaml
W0409 14:13:27.362828 66305 strict.go:47] unknown configuration schema.GroupVersionKind{Group:"", Version:"v1", Kind:"ConfigMap"} for scheme definitions in "k8s.io/kubernetes/cmd/kubeadm/app/apis/kubeadm/scheme/scheme.go:31" and "k8s.io/kubernetes/cmd/kubeadm/app/componentconfigs/scheme.go:28"
no InitConfiguration or ClusterConfiguration kind was found in the YAML file
To see the stack trace of this error execute with --v=5 or higher
My clusterconfig is below
root@master1:~# cat clusterconfig.yaml
apiVersion: v1
data:
ClusterConfiguration: |
apiServer:
extraArgs:
authorization-mode: Node,RBAC
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta2
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns:
type: CoreDNS
etcd:
local:
dataDir: /var/lib/etcd
imageRepository: k8s.gcr.io
kind: ClusterConfiguration
kubernetesVersion: v1.18.1
networking:
dnsDomain: cluster.local
podSubnet: 10.244.0.0/16
serviceSubnet: 10.96.0.0/12
scheduler: {}
ClusterStatus: |
apiEndpoints:
master1:
advertiseAddress: 10.0.0.4
bindPort: 6443
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterStatus
kind: ConfigMap
metadata:
creationTimestamp: "2020-03-15T23:35:26Z"
name: kubeadm-config
namespace: kube-system
resourceVersion: "2053412"
selfLink: /api/v1/namespaces/kube-system/configmaps/kubeadm-config
uid: 7d47e6b6-043f-41a9-a9e9-53600efee7df
my mistake, next to get cm you needed also to extract only the clusterconfiguration part from the configmap.
just try copy pasting this in a file
apiServer:
extraArgs:
authorization-mode: Node,RBAC
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta2
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns:
type: CoreDNS
etcd:
local:
dataDir: /var/lib/etcd
imageRepository: k8s.gcr.io
kind: ClusterConfiguration
kubernetesVersion: v1.18.1
networking:
dnsDomain: cluster.local
podSubnet: 10.244.0.0/16
serviceSubnet: 10.96.0.0/12
scheduler: {}
then use it with kubeadm init phase addon coredns --config clusterconfig.yaml
the purpose of passing a config is to preserve the setup you used on kubeadm init originally...
Not sure why this is marked as fixed since it actually requires ALL nodes to be drained not control plane only. That's what hangs process on single node control plane. I have worker node still schedulable and hence I guess original fix was never run.
This fixed it https://github.com/kubernetes/kubeadm/issues/2035#issuecomment-607330985
Not sure why this is marked as fixed since it actually requires ALL nodes to be drained not control plane only
well, this is what the kubeadm upgrade guide requires:
https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/
cc @rajansandeep perhaps you have a better idea what is going on here?
Yes, let me take a look.
I'd like to try and reproduce this first.
For me it was single node control plan with single node windows worker created on 1.17.4 originally and trying to upgrade it to 1.18.1
I am also getting the error, When running a single control plane cluster, kubeadm upgrade hangs after printing [addons] Applied essential addon: CoreDNS.I am trying to upgrade from v1.17.2 to v1.17.4.
I was unable to reproduce the issue:
I tried kubeadm upgrade from 1.17.4->1.18.1 following the instructions from https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/ with 2 scenarios:
One with single node control plan with single node worker
and the second with just a single node.
In both cases, I was successfully able to upgrade kubeadm with no hang issues.
Note that this was a base kubeadm install with no modifications.
I reproduced this using a single node k8s (one node serves as both control plane and worker) when upgrading from v1.15.0 to v1.16.12. I applied the following command to apply upgrade:
apt-get update && \
apt-get install -y --allow-change-held-packages kubeadm=1.16.12-00
kubeadm upgrade plan
kubectl drain test-node --ignore-daemonsets --delete-local-data
kubeadm upgrade apply v1.16.12
After applyting the upgrade apply command, CLI stucked on "[addons] Applied essential addon: CoreDNS".
Using the upgrade apply command with argument -v=10 indicates kubeadm is waiting a CoreDNS pod to start. However this is a single node k8s, the only node in it is already drained, therefore no CoreDNS can be started. I wonder if this is the root cause of the issue.
I did an experiment to prove my assumption: when the upgrade process stucks on "[addons] Applied essential addon: CoreDNS", I uncordoned the only node in k8s, then the upgrade process finished successfully.
Same here. I try to upgrade from v1.18.5 to v1.19.3 but get stucked with a timeout on
[addons] Applied essential addon: kube-proxy
Drained all nodes and upgrade each individually was successful
Helps also for me here.
oddly we do not see this problem in CI / e2e.
but we have minor pending changes to the upgrade docs related to this:
https://github.com/kubernetes/website/pull/24704
summary: drain / uncordon should be done around the kubelet restart for CP nodes.
I'm reproducing that issue from my chef cookbook when trying to upgrade from 1.16 to 1.17 on Debian Buster (while the same passes on Debian Stretch).
Someone on SO said that the nodeRegistration.name needs to be the same as the machine's hostname. I don't have any output from the command kubectl -n kube-system get cm kubeadm-config -o jsonpath={.data.MasterConfiguration} which I guess is from a old Kubernetes version and therefore doesn't work anymore.
How can I check what's the nodeRegistration.name, or its equivalent from today?
How can I check what's the nodeRegistration.name, or its equivalent from today?
.name is passed to the kubelet and that is how the node is registered.
so it's equivalent to the node name.
Thank you @neolit123 for your comment.
On my side, I've finished to test each version of Kubernetes (from 1.15 to 1.20) and 1.17, 1.19 and 1.20 are failing on Debian Buster. On Stretch all are passing.
I will now test the same but with Ubuntu 18.04.
Most helpful comment
I am also getting the error, When running a single control plane cluster,
kubeadm upgradehangs after printing[addons] Applied essential addon: CoreDNS.I am trying to upgrade from v1.17.2 to v1.17.4.