What steps did you take and what happened:
Created a new cluster with 3 control plane nodes. Scaled to 4 erroneously having forgotten about etcd quorum rules. KubeadmControlPlane enters bad state.
What did you expect to happen:
The validating webhook should not allow unsupported replica counts.
Environment:
kubectl version): 1.17.11/etc/os-release): FCOS32Details
The scaling to 4 occurs at 21:36. Prior to this the machines, cluster, kubeadmcontrolplane are all nominal. After scaling the 3 replicas show as UNAVAILBLE in the KubeadmControlPlane object. A new Machine never appears. Attempting to scale to 5 aftwards does nothing. The 3 replicas stay in UNAVAILABLE.
I0107 21:25:03.681311 1 scale.go:193] controllers/KubeadmControlPlane "msg"="Waiting for control plane to pass preflight checks" "cluster"="onprem-2100213" "kubeadmControlPlane"="onprem-2100213" "namespace"="k8s-capi-ci-2100213" "failures"="[machine onprem-2100213-kr8gb does not have APIServerPodHealthy condition, machine onprem-2100213-kr8gb does not have ControllerManagerPodHealthy condition, machine onprem-2100213-kr8gb does not have SchedulerPodHealthy condition, machine onprem-2100213-kr8gb does not have EtcdPodHealthy condition, machine onprem-2100213-kr8gb does not have EtcdMemberHealthy condition]" I0107 21:25:10.098125 1 controller.go:244] controllers/KubeadmControlPlane "msg"="Reconcile KubeadmControlPlane" "cluster"="onprem-2100213" "kubeadmControlPlane"="onprem-2100213" "namespace"="k8s-capi-ci-2100213" I0107 21:25:57.983523 1 controller.go:355] controllers/KubeadmControlPlane "msg"="Scaling up control plane" "cluster"="onprem-2100213" "kubeadmControlPlane"="onprem-2100213" "namespace"="k8s-capi-ci-2100213" "Desired"=3 "Existing"=2 I0107 21:25:57.983788 1 scale.go:193] controllers/KubeadmControlPlane "msg"="Waiting for control plane to pass preflight checks" "cluster"="onprem-2100213" "kubeadmControlPlane"="onprem-2100213" "namespace"="k8s-capi-ci-2100213" "failures"="[machine onprem-2100213-lsm8r reports APIServerPodHealthy condition is unknown (Failed to get the node which is hosting this component), machine onprem-2100213-lsm8r reports ControllerManagerPodHealthy condition is unknown (Failed to get the node which is hosting this component), machine onprem-2100213-lsm8r reports SchedulerPodHealthy condition is unknown (Failed to get the node which is hosting this component), machine onprem-2100213-lsm8r reports EtcdPodHealthy condition is unknown (Failed to get the node which is hosting this component), machine onprem-2100213-kr8gb reports APIServerPodHealthy condition is unknown (Failed to get the node which is hosting this component), machine onprem-2100213-kr8gb reports ControllerManagerPodHealthy condition is unknown (Failed to get the node which is hosting this component), machine onprem-2100213-kr8gb reports SchedulerPodHealthy condition is unknown (Failed to get the node which is hosting this component), machine onprem-2100213-kr8gb reports EtcdPodHealthy condition is unknown (Failed to get the node which is hosting this component)]" I0107 21:25:58.554330 1 controller.go:244] controllers/KubeadmControlPlane "msg"="Reconcile KubeadmControlPlane" "cluster"="onprem-2100213" "kubeadmControlPlane"="onprem-2100213" "namespace"="k8s-capi-ci-2100213" I0107 21:26:01.020837 1 controller.go:355] controllers/KubeadmControlPlane "msg"="Scaling up control plane" "cluster"="onprem-2100213" "kubeadmControlPlane"="onprem-2100213" "namespace"="k8s-capi-ci-2100213" "Desired"=3 "Existing"=2 I0107 21:26:01.651916 1 controller.go:244] controllers/KubeadmControlPlane "msg"="Reconcile KubeadmControlPlane" "cluster"="onprem-2100213" "kubeadmControlPlane"="onprem-2100213" "namespace"="k8s-capi-ci-2100213" I0107 21:26:05.507269 1 controller.go:244] controllers/KubeadmControlPlane "msg"="Reconcile KubeadmControlPlane" "cluster"="onprem-2100213" "kubeadmControlPlane"="onprem-2100213" "namespace"="k8s-capi-ci-2100213" I0107 21:26:09.448376 1 controller.go:244] controllers/KubeadmControlPlane "msg"="Reconcile KubeadmControlPlane" "cluster"="onprem-2100213" "kubeadmControlPlane"="onprem-2100213" "namespace"="k8s-capi-ci-2100213" I0107 21:26:13.213617 1 controller.go:244] controllers/KubeadmControlPlane "msg"="Reconcile KubeadmControlPlane" "cluster"="onprem-2100213" "kubeadmControlPlane"="onprem-2100213" "namespace"="k8s-capi-ci-2100213" I0107 21:26:21.307608 1 controller.go:244] controllers/KubeadmControlPlane "msg"="Reconcile KubeadmControlPlane" "cluster"="onprem-2100213" "kubeadmControlPlane"="onprem-2100213" "namespace"="k8s-capi-ci-2100213" I0107 21:26:49.408640 1 controller.go:244] controllers/KubeadmControlPlane "msg"="Reconcile KubeadmControlPlane" "cluster"="onprem-2100213" "kubeadmControlPlane"="onprem-2100213" "namespace"="k8s-capi-ci-2100213" I0107 21:26:52.751657 1 controller.go:244] controllers/KubeadmControlPlane "msg"="Reconcile KubeadmControlPlane" "cluster"="onprem-2100213" "kubeadmControlPlane"="onprem-2100213" "namespace"="k8s-capi-ci-2100213" I0107 21:26:56.760917 1 controller.go:244] controllers/KubeadmControlPlane "msg"="Reconcile KubeadmControlPlane" "cluster"="onprem-2100213" "kubeadmControlPlane"="onprem-2100213" "namespace"="k8s-capi-ci-2100213" I0107 21:27:09.965862 1 controller.go:244] controllers/KubeadmControlPlane "msg"="Reconcile KubeadmControlPlane" "cluster"="onprem-2100213" "kubeadmControlPlane"="onprem-2100213" "namespace"="k8s-capi-ci-2100213" I0107 21:27:14.906036 1 controller.go:244] controllers/KubeadmControlPlane "msg"="Reconcile KubeadmControlPlane" "cluster"="onprem-2100213" "kubeadmControlPlane"="onprem-2100213" "namespace"="k8s-capi-ci-2100213" I0107 21:27:20.489216 1 controller.go:244] controllers/KubeadmControlPlane "msg"="Reconcile KubeadmControlPlane" "cluster"="onprem-2100213" "kubeadmControlPlane"="onprem-2100213" "namespace"="k8s-capi-ci-2100213" I0107 21:27:25.235404 1 controller.go:244] controllers/KubeadmControlPlane "msg"="Reconcile KubeadmControlPlane" "cluster"="onprem-2100213" "kubeadmControlPlane"="onprem-2100213" "namespace"="k8s-capi-ci-2100213" I0107 21:36:26.267999 1 controller.go:244] controllers/KubeadmControlPlane "msg"="Reconcile KubeadmControlPlane" "cluster"="onprem-2100213" "kubeadmControlPlane"="onprem-2100213" "namespace"="k8s-capi-ci-2100213" I0107 21:36:26.866256 1 controller.go:355] controllers/KubeadmControlPlane "msg"="Scaling up control plane" "cluster"="onprem-2100213" "kubeadmControlPlane"="onprem-2100213" "namespace"="k8s-capi-ci-2100213" "Desired"=4 "Existing"=3 I0107 21:36:26.866569 1 scale.go:193] controllers/KubeadmControlPlane "msg"="Waiting for control plane to pass preflight checks" "cluster"="onprem-2100213" "kubeadmControlPlane"="onprem-2100213" "namespace"="k8s-capi-ci-2100213" "failures"="[machine onprem-2100213-lsm8r reports APIServerPodHealthy condition is false (Error, Missing node), machine onprem-2100213-lsm8r reports ControllerManagerPodHealthy condition is false (Error, Missing node), machine onprem-2100213-lsm8r reports SchedulerPodHealthy condition is false (Error, Missing node), machine onprem-2100213-lsm8r reports EtcdPodHealthy condition is false (Error, Missing node), machine onprem-2100213-t9n4c reports APIServerPodHealthy condition is false (Error, Missing node), machine onprem-2100213-t9n4c reports ControllerManagerPodHealthy condition is false (Error, Missing node), machine onprem-2100213-t9n4c reports SchedulerPodHealthy condition is false (Error, Missing node), machine onprem-2100213-t9n4c reports EtcdPodHealthy condition is false (Error, Missing node), machine onprem-2100213-kr8gb reports APIServerPodHealthy condition is false (Error, Missing node), machine onprem-2100213-kr8gb reports ControllerManagerPodHealthy condition is false (Error, Missing node), machine onprem-2100213-kr8gb reports SchedulerPodHealthy condition is false (Error, Missing node), machine onprem-2100213-kr8gb reports EtcdPodHealthy condition is false (Error, Missing node)]" I0107 21:36:27.360999 1 controller.go:244] controllers/KubeadmControlPlane "msg"="Reconcile KubeadmControlPlane" "cluster"="onprem-2100213" "kubeadmControlPlane"="onprem-2100213" "namespace"="k8s-capi-ci-2100213" I0107 21:36:27.902548 1 controller.go:355] controllers/KubeadmControlPlane "msg"="Scaling up control plane" "cluster"="onprem-2100213" "kubeadmControlPlane"="onprem-2100213" "namespace"="k8s-capi-ci-2100213" "Desired"=4 "Existing"=3 I0107 21:36:27.902831 1 scale.go:193] controllers/KubeadmControlPlane "msg"="Waiting for control plane to pass preflight checks" "cluster"="onprem-2100213" "kubeadmControlPlane"="onprem-2100213" "namespace"="k8s-capi-ci-2100213" "failures"="[machine onprem-2100213-kr8gb reports APIServerPodHealthy condition is false (Error, Missing node), machine onprem-2100213-kr8gb reports ControllerManagerPodHealthy condition is false (Error, Missing node), machine onprem-2100213-kr8gb reports SchedulerPodHealthy condition is false (Error, Missing node), machine onprem-2100213-kr8gb reports EtcdPodHealthy condition is false (Error, Missing node), machine onprem-2100213-lsm8r reports APIServerPodHealthy condition is false (Error, Missing node), machine onprem-2100213-lsm8r reports ControllerManagerPodHealthy condition is false (Error, Missing node), machine onprem-2100213-lsm8r reports SchedulerPodHealthy condition is false (Error, Missing node), machine onprem-2100213-lsm8r reports EtcdPodHealthy condition is false (Error, Missing node), machine onprem-2100213-t9n4c reports APIServerPodHealthy condition is false (Error, Missing node), machine onprem-2100213-t9n4c reports ControllerManagerPodHealthy condition is false (Error, Missing node), machine onprem-2100213-t9n4c reports SchedulerPodHealthy condition is false (Error, Missing node), machine onprem-2100213-t9n4c reports EtcdPodHealthy condition is false (Error, Missing node)]" I0107 21:36:28.407161 1 controller.go:244] controllers/KubeadmControlPlane "msg"="Reconcile KubeadmControlPlane" "cluster"="onprem-2100213" "kubeadmControlPlane"="onprem-2100213" "namespace"="k8s-capi-ci-2100213" I0107 21:36:31.107266 1 controller.go:355] controllers/KubeadmControlPlane "msg"="Scaling up control plane" "cluster"="onprem-2100213" "kubeadmControlPlane"="onprem-2100213" "namespace"="k8s-capi-ci-2100213" "Desired"=4 "Existing"=3
kubectl -n k8s-capi-ci-2100213 get machines
NAME PROVIDERID PHASE VERSION
onprem-2100213-kr8gb vsphere://42216d47-6549-edaa-93c4-4ef6c6695f9c Running v1.17.11
onprem-2100213-lsm8r vsphere://42213d85-6725-ce0e-096a-de4cb05e31b6 Running v1.17.11
onprem-2100213-t9n4c vsphere://4221d556-d6f0-62c3-2f45-df19ed90c834 Running v1.17.11
onprem-2100213-worker-5f8f997cf8-9gl52 vsphere://42214a9f-4782-b886-5987-9c71ebf0e3d3 Running v1.17.11
onprem-2100213-worker-5f8f997cf8-b9vgv vsphere://4221201d-8a93-33e2-b29b-3b44a6b99827 Running v1.17.11
onprem-2100213-worker-5f8f997cf8-d7drk vsphere://422156ee-6c2c-044b-5289-1c191a28a41c Running v1.17.11
kubectl -n k8s-capi-ci-2100213 get cluster
NAME PHASE
onprem-2100213 Provisioned
kubectl --kubeconfig ~/.kube/onprem-2100213.kubeconfig get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-74589489bd-dfzzw 1/1 Running 0 46m
coredns-74589489bd-zv6tr 1/1 Running 0 46m
etcd-onprem-2100213-kr8gb 1/1 Running 0 44m
etcd-onprem-2100213-lsm8r 1/1 Running 0 46m
etcd-onprem-2100213-t9n4c 1/1 Running 0 43m
kube-apiserver-onprem-2100213-kr8gb 1/1 Running 0 45m
kube-apiserver-onprem-2100213-lsm8r 1/1 Running 0 46m
kube-apiserver-onprem-2100213-t9n4c 1/1 Running 0 43m
kube-controller-manager-onprem-2100213-kr8gb 1/1 Running 0 45m
kube-controller-manager-onprem-2100213-lsm8r 1/1 Running 1 46m
kube-controller-manager-onprem-2100213-t9n4c 1/1 Running 0 43m
kube-flannel-cqrvz 1/1 Running 0 45m
kube-flannel-fcbf8 1/1 Running 0 45m
kube-flannel-fzf4z 1/1 Running 0 45m
kube-flannel-hhkvr 1/1 Running 0 43m
kube-flannel-jcq6h 1/1 Running 2 45m
kube-flannel-r8c5z 1/1 Running 0 46m
kube-proxy-gz54d 1/1 Running 0 45m
kube-proxy-lvtdz 1/1 Running 0 45m
kube-proxy-mmxmr 1/1 Running 0 43m
kube-proxy-rsdz6 1/1 Running 0 46m
kube-proxy-v4mb6 1/1 Running 0 45m
kube-proxy-wcz6w 1/1 Running 0 45m
kube-scheduler-onprem-2100213-kr8gb 1/1 Running 0 45m
kube-scheduler-onprem-2100213-lsm8r 1/1 Running 1 46m
kube-scheduler-onprem-2100213-t9n4c 1/1 Running 0 43m
kube-vip-onprem-2100213-kr8gb 1/1 Running 0 45m
kube-vip-onprem-2100213-lsm8r 1/1 Running 1 46m
kube-vip-onprem-2100213-t9n4c 1/1 Running 0 43m
metrics-server-cd4b475f6-f42pk 1/1 Running 0 41m
metrics-server-cd4b475f6-f4vfx 1/1 Running 0 41m
vsphere-csi-controller-ccfc7955b-q6bbk 5/5 Running 3 46m
vsphere-csi-node-4mjfl 3/3 Running 0 46m
vsphere-csi-node-52nqs 3/3 Running 0 45m
vsphere-csi-node-cbwl2 3/3 Running 0 45m
vsphere-csi-node-ckllc 3/3 Running 0 45m
vsphere-csi-node-fq829 3/3 Running 0 45m
vsphere-csi-node-z4vl2 3/3 Running 0 43m
kubectl -n k8s-capi-ci-2100213 get kubeadmcontrolplanes.controlplane.cluster.x-k8s.io
NAME INITIALIZED API SERVER AVAILABLE VERSION REPLICAS READY UPDATED UNAVAILABLE
onprem-2100213 true true v1.17.11 3 3 3
Thank you!
/kind bug
[One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels]
After discussing on slack, we found the "unavailable" control plane issue is due to https://github.com/kubernetes-sigs/cluster-api/issues/3961, which is fixed in v0.3.12.
However, KCP should not allow scaling to even numbers. Something is not working as expected in https://github.com/kubernetes-sigs/cluster-api/blob/v0.3.11/controlplane/kubeadm/api/v1alpha3/kubeadm_control_plane_webhook.go#L235-L252
To repro, kubectl scale kcp --replicas 4
Confirmed that webhook prevents editing the number of replicas to 4 when using kubectl edit but not with kubectl scale:
➜ k edit kcp
error: kubeadmcontrolplanes.controlplane.cluster.x-k8s.io "my-cluster-control-plane" could not be patched: admission webhook "validation.kubeadmcontrolplane.controlplane.cluster.x-k8s.io" denied the request: KubeadmControlPlane.controlplane.cluster.x-k8s.io "my-cluster-control-plane" is invalid: spec.replicas: Forbidden: cannot be an even number when using managed etcd
You can run `kubectl replace -f /var/folders/11/f41_16k56w1cxnpw5jwjks0h0000gn/T/kubectl-edit-kpipf.yaml` to try this update again.
➜ k scale kcp my-cluster-control-plane --replicas 4
kubeadmcontrolplane.controlplane.cluster.x-k8s.io/my-cluster-control-plane scaled
Ah! The scale subresource might be going around the validation webhook :/
From https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers, it seems we need to amend our webhooks registration to include the subresources
/priority important-soon
/milestone v0.4.0
/area api
We should also backport the possible fix and add a test
@yastij Would you be interested in fixing this?
Kind of related, it seems we're also not getting requests for the status subresource 🤔
That seems a bit different, from the docs https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.19/#validatingwebhook-v1-admissionregistration-k8s-io seems that we might need to add as an example clusterresourcesets/* as an additional resource?
Yeah I guess it should be related as we only change replicas here.
/assign