/kind bug
What steps did you take and what happened:
I installed kubeflow on google kubernetes engine following the documentation using the deploy using CLI steps. This succeeded and I was able to use kubeflow. Afterwards I scaled down the cluster to 0 nodes for the weekend in order to reduce costs. After resizing the cluster to 1 again all pods could not be created due to the inferenceservice.kfserving-webhook-server.pod-mutator webhook.
This webhook fires for all pod creation where the namespace does not have a label with key component-plane and invokes the kfserving-webhook-server-service. But the pod which serves as an endpoint can itself not be created because creation is intercepted by the webhook.
After digging into the webhook i found the selector to be:
namespaceSelector:
matchExpressions:
- key: control-plane
operator: DoesNotExist
And the pods which become an endpoint have the following labels:
template:
metadata:
creationTimestamp: null
labels:
app.kubernetes.io/component: kfserving-install
app.kubernetes.io/instance: kfserving-install-v0.7.0
app.kubernetes.io/managed-by: kfctl
app.kubernetes.io/name: kfserving-install
app.kubernetes.io/part-of: kubeflow
app.kubernetes.io/version: v0.7.0
control-plane: kfserving-controller-manager
controller-tools.k8s.io: "1.0"
kustomize.component: kfserving
But the pods still match the namespace selector because the namespace does not have these labels. Therefore the creation of these pods is intercepted by the mutatingwebhook and an endpoint for kfserving-webhook-server-service never is created. This causes the mutatingwebhook to fail for all pod creations, causing the complete cluster to not function.
After removing the mutatingwebhook the cluster is up and running again. My question is, what would be the right approach to solve this issue?
What did you expect to happen:
Resizing the cluster from 0 to 1 starts the cluster without having to remove the webhook.
Environment:
/etc/os-release): MacOS Mojava 10.14.5Issue Label Bot is not confident enough to auto-label this issue.
See dashboard for more details.
How do you remove the mutatingwebhook
Duplicate of kubeflow/kfserving#568
You can delete the webhook like so
kubectl -n kubeflow delete MutatingWebHookConfiguration inferenceservice.serving.kubeflow.org
Duplicate of kubeflow/kfserving#568
Most helpful comment
Duplicate of kubeflow/kfserving#568
You can delete the webhook like so