Cluster-api: Scaling KubeadmControlPlane from 3 to 4 replicas is allowed by the validating webhook

Created on 8 Jan 2021 · 10Comments · Source: kubernetes-sigs/cluster-api

What steps did you take and what happened:

Created a new cluster with 3 control plane nodes. Scaled to 4 erroneously having forgotten about etcd quorum rules. KubeadmControlPlane enters bad state.

What did you expect to happen:

The validating webhook should not allow unsupported replica counts.

Environment:

Cluster-api version: 0.3.11
CAPV version: 0.7.1
Kubernetes version: (use kubectl version): 1.17.11
OS (e.g. from /etc/os-release): FCOS32

Details

The scaling to 4 occurs at 21:36. Prior to this the machines, cluster, kubeadmcontrolplane are all nominal. After scaling the 3 replicas show as UNAVAILBLE in the KubeadmControlPlane object. A new Machine never appears. Attempting to scale to 5 aftwards does nothing. The 3 replicas stay in UNAVAILABLE.

I0107 21:25:03.681311       1 scale.go:193] controllers/KubeadmControlPlane "msg"="Waiting for control plane to pass preflight checks" "cluster"="onprem-2100213" "kubeadmControlPlane"="onprem-2100213" "namespace"="k8s-capi-ci-2100213" "failures"="[machine onprem-2100213-kr8gb does not have APIServerPodHealthy condition, machine onprem-2100213-kr8gb does not have ControllerManagerPodHealthy condition, machine onprem-2100213-kr8gb does not have SchedulerPodHealthy condition, machine onprem-2100213-kr8gb does not have EtcdPodHealthy condition, machine onprem-2100213-kr8gb does not have EtcdMemberHealthy condition]"                                                                                         I0107 21:25:10.098125       1 controller.go:244] controllers/KubeadmControlPlane "msg"="Reconcile KubeadmControlPlane" "cluster"="onprem-2100213" "kubeadmControlPlane"="onprem-2100213" "namespace"="k8s-capi-ci-2100213"                  I0107 21:25:57.983523       1 controller.go:355] controllers/KubeadmControlPlane "msg"="Scaling up control plane" "cluster"="onprem-2100213" "kubeadmControlPlane"="onprem-2100213" "namespace"="k8s-capi-ci-2100213" "Desired"=3 "Existing"=2                                                                                                                                                                                                                                          I0107 21:25:57.983788       1 scale.go:193] controllers/KubeadmControlPlane "msg"="Waiting for control plane to pass preflight checks" "cluster"="onprem-2100213" "kubeadmControlPlane"="onprem-2100213" "namespace"="k8s-capi-ci-2100213" "failures"="[machine onprem-2100213-lsm8r reports APIServerPodHealthy condition is unknown (Failed to get the node which is hosting this component), machine onprem-2100213-lsm8r reports ControllerManagerPodHealthy condition is unknown (Failed to get the node which is hosting this component), machine onprem-2100213-lsm8r reports SchedulerPodHealthy condition is unknown (Failed to get the node which is hosting this component), machine onprem-2100213-lsm8r reports EtcdPodHealthy condition is unknown (Failed to get the node which is hosting this component), machine onprem-2100213-kr8gb reports APIServerPodHealthy condition is unknown (Failed to get the node which is hosting this component), machine onprem-2100213-kr8gb reports ControllerManagerPodHealthy condition is unknown (Failed to get the node which is hosting this component), machine onprem-2100213-kr8gb reports SchedulerPodHealthy condition is unknown (Failed to get the node which is hosting this component), machine onprem-2100213-kr8gb reports EtcdPodHealthy condition is unknown (Failed to get the node which is hosting this component)]"                                                                          I0107 21:25:58.554330       1 controller.go:244] controllers/KubeadmControlPlane "msg"="Reconcile KubeadmControlPlane" "cluster"="onprem-2100213" "kubeadmControlPlane"="onprem-2100213" "namespace"="k8s-capi-ci-2100213"                  I0107 21:26:01.020837       1 controller.go:355] controllers/KubeadmControlPlane "msg"="Scaling up control plane" "cluster"="onprem-2100213" "kubeadmControlPlane"="onprem-2100213" "namespace"="k8s-capi-ci-2100213" "Desired"=3 "Existing"=2                                                                                                                                                                                                                                          I0107 21:26:01.651916       1 controller.go:244] controllers/KubeadmControlPlane "msg"="Reconcile KubeadmControlPlane" "cluster"="onprem-2100213" "kubeadmControlPlane"="onprem-2100213" "namespace"="k8s-capi-ci-2100213"                  I0107 21:26:05.507269       1 controller.go:244] controllers/KubeadmControlPlane "msg"="Reconcile KubeadmControlPlane" "cluster"="onprem-2100213" "kubeadmControlPlane"="onprem-2100213" "namespace"="k8s-capi-ci-2100213"                  I0107 21:26:09.448376       1 controller.go:244] controllers/KubeadmControlPlane "msg"="Reconcile KubeadmControlPlane" "cluster"="onprem-2100213" "kubeadmControlPlane"="onprem-2100213" "namespace"="k8s-capi-ci-2100213"                  I0107 21:26:13.213617       1 controller.go:244] controllers/KubeadmControlPlane "msg"="Reconcile KubeadmControlPlane" "cluster"="onprem-2100213" "kubeadmControlPlane"="onprem-2100213" "namespace"="k8s-capi-ci-2100213"                  I0107 21:26:21.307608       1 controller.go:244] controllers/KubeadmControlPlane "msg"="Reconcile KubeadmControlPlane" "cluster"="onprem-2100213" "kubeadmControlPlane"="onprem-2100213" "namespace"="k8s-capi-ci-2100213"                  I0107 21:26:49.408640       1 controller.go:244] controllers/KubeadmControlPlane "msg"="Reconcile KubeadmControlPlane" "cluster"="onprem-2100213" "kubeadmControlPlane"="onprem-2100213" "namespace"="k8s-capi-ci-2100213"                  I0107 21:26:52.751657       1 controller.go:244] controllers/KubeadmControlPlane "msg"="Reconcile KubeadmControlPlane" "cluster"="onprem-2100213" "kubeadmControlPlane"="onprem-2100213" "namespace"="k8s-capi-ci-2100213"                  I0107 21:26:56.760917       1 controller.go:244] controllers/KubeadmControlPlane "msg"="Reconcile KubeadmControlPlane" "cluster"="onprem-2100213" "kubeadmControlPlane"="onprem-2100213" "namespace"="k8s-capi-ci-2100213"                  I0107 21:27:09.965862       1 controller.go:244] controllers/KubeadmControlPlane "msg"="Reconcile KubeadmControlPlane" "cluster"="onprem-2100213" "kubeadmControlPlane"="onprem-2100213" "namespace"="k8s-capi-ci-2100213"                  I0107 21:27:14.906036       1 controller.go:244] controllers/KubeadmControlPlane "msg"="Reconcile KubeadmControlPlane" "cluster"="onprem-2100213" "kubeadmControlPlane"="onprem-2100213" "namespace"="k8s-capi-ci-2100213"                  I0107 21:27:20.489216       1 controller.go:244] controllers/KubeadmControlPlane "msg"="Reconcile KubeadmControlPlane" "cluster"="onprem-2100213" "kubeadmControlPlane"="onprem-2100213" "namespace"="k8s-capi-ci-2100213"                  I0107 21:27:25.235404       1 controller.go:244] controllers/KubeadmControlPlane "msg"="Reconcile KubeadmControlPlane" "cluster"="onprem-2100213" "kubeadmControlPlane"="onprem-2100213" "namespace"="k8s-capi-ci-2100213"                  I0107 21:36:26.267999       1 controller.go:244] controllers/KubeadmControlPlane "msg"="Reconcile KubeadmControlPlane" "cluster"="onprem-2100213" "kubeadmControlPlane"="onprem-2100213" "namespace"="k8s-capi-ci-2100213"                  I0107 21:36:26.866256       1 controller.go:355] controllers/KubeadmControlPlane "msg"="Scaling up control plane" "cluster"="onprem-2100213" "kubeadmControlPlane"="onprem-2100213" "namespace"="k8s-capi-ci-2100213" "Desired"=4 "Existing"=3                                                                                                                                                                                                                                          I0107 21:36:26.866569       1 scale.go:193] controllers/KubeadmControlPlane "msg"="Waiting for control plane to pass preflight checks" "cluster"="onprem-2100213" "kubeadmControlPlane"="onprem-2100213" "namespace"="k8s-capi-ci-2100213" "failures"="[machine onprem-2100213-lsm8r reports APIServerPodHealthy condition is false (Error, Missing node), machine onprem-2100213-lsm8r reports ControllerManagerPodHealthy condition is false (Error, Missing node), machine onprem-2100213-lsm8r reports SchedulerPodHealthy condition is false (Error, Missing node), machine onprem-2100213-lsm8r reports EtcdPodHealthy condition is false (Error, Missing node), machine onprem-2100213-t9n4c reports APIServerPodHealthy condition is false (Error, Missing node), machine onprem-2100213-t9n4c reports ControllerManagerPodHealthy condition is false (Error, Missing node), machine onprem-2100213-t9n4c reports SchedulerPodHealthy condition is false (Error, Missing node), machine onprem-2100213-t9n4c reports EtcdPodHealthy condition is false (Error, Missing node), machine onprem-2100213-kr8gb reports APIServerPodHealthy condition is false (Error, Missing node), machine onprem-2100213-kr8gb reports ControllerManagerPodHealthy condition is false (Error, Missing node), machine onprem-2100213-kr8gb reports SchedulerPodHealthy condition is false (Error, Missing node), machine onprem-2100213-kr8gb reports EtcdPodHealthy condition is false (Error, Missing node)]"                                                                                                                                                                                                               I0107 21:36:27.360999       1 controller.go:244] controllers/KubeadmControlPlane "msg"="Reconcile KubeadmControlPlane" "cluster"="onprem-2100213" "kubeadmControlPlane"="onprem-2100213" "namespace"="k8s-capi-ci-2100213"                  I0107 21:36:27.902548       1 controller.go:355] controllers/KubeadmControlPlane "msg"="Scaling up control plane" "cluster"="onprem-2100213" "kubeadmControlPlane"="onprem-2100213" "namespace"="k8s-capi-ci-2100213" "Desired"=4 "Existing"=3                                                                                                                                                                                                                                          I0107 21:36:27.902831       1 scale.go:193] controllers/KubeadmControlPlane "msg"="Waiting for control plane to pass preflight checks" "cluster"="onprem-2100213" "kubeadmControlPlane"="onprem-2100213" "namespace"="k8s-capi-ci-2100213" "failures"="[machine onprem-2100213-kr8gb reports APIServerPodHealthy condition is false (Error, Missing node), machine onprem-2100213-kr8gb reports ControllerManagerPodHealthy condition is false (Error, Missing node), machine onprem-2100213-kr8gb reports SchedulerPodHealthy condition is false (Error, Missing node), machine onprem-2100213-kr8gb reports EtcdPodHealthy condition is false (Error, Missing node), machine onprem-2100213-lsm8r reports APIServerPodHealthy condition is false (Error, Missing node), machine onprem-2100213-lsm8r reports ControllerManagerPodHealthy condition is false (Error, Missing node), machine onprem-2100213-lsm8r reports SchedulerPodHealthy condition is false (Error, Missing node), machine onprem-2100213-lsm8r reports EtcdPodHealthy condition is false (Error, Missing node), machine onprem-2100213-t9n4c reports APIServerPodHealthy condition is false (Error, Missing node), machine onprem-2100213-t9n4c reports ControllerManagerPodHealthy condition is false (Error, Missing node), machine onprem-2100213-t9n4c reports SchedulerPodHealthy condition is false (Error, Missing node), machine onprem-2100213-t9n4c reports EtcdPodHealthy condition is false (Error, Missing node)]"                                                                                                                                                                                                               I0107 21:36:28.407161       1 controller.go:244] controllers/KubeadmControlPlane "msg"="Reconcile KubeadmControlPlane" "cluster"="onprem-2100213" "kubeadmControlPlane"="onprem-2100213" "namespace"="k8s-capi-ci-2100213"                  I0107 21:36:31.107266       1 controller.go:355] controllers/KubeadmControlPlane "msg"="Scaling up control plane" "cluster"="onprem-2100213" "kubeadmControlPlane"="onprem-2100213" "namespace"="k8s-capi-ci-2100213" "Desired"=4 "Existing"=3

kubectl -n k8s-capi-ci-2100213 get machines
NAME                                     PROVIDERID                                       PHASE     VERSION
onprem-2100213-kr8gb                     vsphere://42216d47-6549-edaa-93c4-4ef6c6695f9c   Running   v1.17.11
onprem-2100213-lsm8r                     vsphere://42213d85-6725-ce0e-096a-de4cb05e31b6   Running   v1.17.11
onprem-2100213-t9n4c                     vsphere://4221d556-d6f0-62c3-2f45-df19ed90c834   Running   v1.17.11
onprem-2100213-worker-5f8f997cf8-9gl52   vsphere://42214a9f-4782-b886-5987-9c71ebf0e3d3   Running   v1.17.11
onprem-2100213-worker-5f8f997cf8-b9vgv   vsphere://4221201d-8a93-33e2-b29b-3b44a6b99827   Running   v1.17.11
onprem-2100213-worker-5f8f997cf8-d7drk   vsphere://422156ee-6c2c-044b-5289-1c191a28a41c   Running   v1.17.11

kubectl -n k8s-capi-ci-2100213 get cluster
NAME             PHASE
onprem-2100213   Provisioned

kubectl --kubeconfig ~/.kube/onprem-2100213.kubeconfig get pods -n kube-system
NAME                                           READY   STATUS    RESTARTS   AGE
coredns-74589489bd-dfzzw                       1/1     Running   0          46m
coredns-74589489bd-zv6tr                       1/1     Running   0          46m
etcd-onprem-2100213-kr8gb                      1/1     Running   0          44m
etcd-onprem-2100213-lsm8r                      1/1     Running   0          46m
etcd-onprem-2100213-t9n4c                      1/1     Running   0          43m
kube-apiserver-onprem-2100213-kr8gb            1/1     Running   0          45m
kube-apiserver-onprem-2100213-lsm8r            1/1     Running   0          46m
kube-apiserver-onprem-2100213-t9n4c            1/1     Running   0          43m
kube-controller-manager-onprem-2100213-kr8gb   1/1     Running   0          45m
kube-controller-manager-onprem-2100213-lsm8r   1/1     Running   1          46m
kube-controller-manager-onprem-2100213-t9n4c   1/1     Running   0          43m
kube-flannel-cqrvz                             1/1     Running   0          45m
kube-flannel-fcbf8                             1/1     Running   0          45m
kube-flannel-fzf4z                             1/1     Running   0          45m
kube-flannel-hhkvr                             1/1     Running   0          43m
kube-flannel-jcq6h                             1/1     Running   2          45m
kube-flannel-r8c5z                             1/1     Running   0          46m
kube-proxy-gz54d                               1/1     Running   0          45m
kube-proxy-lvtdz                               1/1     Running   0          45m
kube-proxy-mmxmr                               1/1     Running   0          43m
kube-proxy-rsdz6                               1/1     Running   0          46m
kube-proxy-v4mb6                               1/1     Running   0          45m
kube-proxy-wcz6w                               1/1     Running   0          45m
kube-scheduler-onprem-2100213-kr8gb            1/1     Running   0          45m
kube-scheduler-onprem-2100213-lsm8r            1/1     Running   1          46m
kube-scheduler-onprem-2100213-t9n4c            1/1     Running   0          43m
kube-vip-onprem-2100213-kr8gb                  1/1     Running   0          45m
kube-vip-onprem-2100213-lsm8r                  1/1     Running   1          46m
kube-vip-onprem-2100213-t9n4c                  1/1     Running   0          43m
metrics-server-cd4b475f6-f42pk                 1/1     Running   0          41m
metrics-server-cd4b475f6-f4vfx                 1/1     Running   0          41m
vsphere-csi-controller-ccfc7955b-q6bbk         5/5     Running   3          46m
vsphere-csi-node-4mjfl                         3/3     Running   0          46m
vsphere-csi-node-52nqs                         3/3     Running   0          45m
vsphere-csi-node-cbwl2                         3/3     Running   0          45m
vsphere-csi-node-ckllc                         3/3     Running   0          45m
vsphere-csi-node-fq829                         3/3     Running   0          45m
vsphere-csi-node-z4vl2                         3/3     Running   0          43m

kubectl -n k8s-capi-ci-2100213 get kubeadmcontrolplanes.controlplane.cluster.x-k8s.io
NAME             INITIALIZED   API SERVER AVAILABLE   VERSION    REPLICAS   READY   UPDATED   UNAVAILABLE
onprem-2100213   true          true                   v1.17.11   3                  3         3

Thank you!

/kind bug
[One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels]

areapi kinbug prioritimportant-soon

Source

wcurry

👍1

All 10 comments

After discussing on slack, we found the "unavailable" control plane issue is due to https://github.com/kubernetes-sigs/cluster-api/issues/3961, which is fixed in v0.3.12.

However, KCP should not allow scaling to even numbers. Something is not working as expected in https://github.com/kubernetes-sigs/cluster-api/blob/v0.3.11/controlplane/kubeadm/api/v1alpha3/kubeadm_control_plane_webhook.go#L235-L252

To repro, kubectl scale kcp --replicas 4

CecileRobertMichon on 8 Jan 2021

👍1

Confirmed that webhook prevents editing the number of replicas to 4 when using kubectl edit but not with kubectl scale:

➜  k edit kcp
error: kubeadmcontrolplanes.controlplane.cluster.x-k8s.io "my-cluster-control-plane" could not be patched: admission webhook "validation.kubeadmcontrolplane.controlplane.cluster.x-k8s.io" denied the request: KubeadmControlPlane.controlplane.cluster.x-k8s.io "my-cluster-control-plane" is invalid: spec.replicas: Forbidden: cannot be an even number when using managed etcd
You can run `kubectl replace -f /var/folders/11/f41_16k56w1cxnpw5jwjks0h0000gn/T/kubectl-edit-kpipf.yaml` to try this update again.
➜ k scale kcp my-cluster-control-plane --replicas 4
kubeadmcontrolplane.controlplane.cluster.x-k8s.io/my-cluster-control-plane scaled

CecileRobertMichon on 8 Jan 2021

👍1

Ah! The scale subresource might be going around the validation webhook :/

vincepri on 8 Jan 2021

From https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers, it seems we need to amend our webhooks registration to include the subresources

vincepri on 8 Jan 2021

/priority important-soon
/milestone v0.4.0
/area api

We should also backport the possible fix and add a test

vincepri on 8 Jan 2021

@yastij Would you be interested in fixing this?

vincepri on 8 Jan 2021

Kind of related, it seems we're also not getting requests for the status subresource 🤔

vincepri on 8 Jan 2021

ref https://github.com/kubernetes/kubernetes/issues/84530

yastij on 8 Jan 2021

That seems a bit different, from the docs https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.19/#validatingwebhook-v1-admissionregistration-k8s-io seems that we might need to add as an example clusterresourcesets/* as an additional resource?

vincepri on 8 Jan 2021

Yeah I guess it should be related as we only change replicas here.

/assign

yastij on 8 Jan 2021

Was this page helpful?

0 / 5 - 0 ratings