Cluster-api: KCP conditions need to indicate when it fails to clone linked templates

Created on 27 Aug 2020  路  9Comments  路  Source: kubernetes-sigs/cluster-api

What steps did you take and what happened:

Spun up a cluster with a bad public ssh key.
Saw no activity.
Eventually looking in:

:~$ kubectl -n capi-kubeadm-control-plane-system logs deploy/capi-kubeadm-control-plane-controller-manager -c manager

It was clear that it was an SSH key missing issue. Since that admission controller denied the request, new machines not getting created, resulting in a starvation scenario during cluster bootstrapping, that isnt detectable via the conditions api.

What did you expect to happen:

I would be able to see this in the conditions of one of the api objects (kubeadm get cluster --all-namespaces |grep -i conditions -A 10 might show something ? or maybe another one of the conditions ?)

Anything else you would like to add:

Environment:

help wanted kinbug prioritimportant-longterm

All 9 comments

cc @ncdc

/retitle KCP conditions need to indicate when it fails to clone the infra machine template

Renamed it to include bootstrap as well

/help
/milestone v0.4.0
/priority important-longterm

@vincepri:
This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

/help
/milestone v0.4.0
/priority important-longterm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

/assign

Just a note here.
Handling the cloning process is a responsibility of the infrastructure provider, and so the cloning error should surface first as a condition on the infrastructure machine. Then, this condition should automatically go up the chain and be reported by KCP into the KCP machineReady condition.
FYI In CAPV, we are already addressing a similar problem

IIRC, this is about a failure to create a new CR from the infra (or bootstrap) template in the KCP spec. I don't think it's related to VM cloning.

Was this page helpful?
0 / 5 - 0 ratings